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ABSTRACT 



We examine two categories of solution algorithms for the 
large-scale multicommodity transshipment problem (MCTP) : 
resource direction and price direction. In the former 
category we construct RDLB , a new algorithm which uses a 
simplified projection method in the subgradient capacity 
reallocations and conjugate subgradient directions with 
approximate line search to provide better termination 
conditions in the Lagrangean lower-bounding iteration. In 
the latter category, we develop DDC , a dual decomposition, 
and we introduce RSD(P) and RSD(A), new non-linear 
decomposition algorithms for the MCTP based on penalty 
transformations of the original problem and using restricted 
simplicial decomposition. 

Computational results are presented for four- and ten- 
product versions of a problem with an underlying network of 
3,300 nodes and 10,400 arcs. Results show RDLB stalls 
before reaching optimality, apparently a common problem in 
primal subgradient reallocations, while the RSD algorithms 
reach near-optimal solutions up to 10 times faster than a 
direct primal simplex-based solver, and display very 
favorable convergence rates compared to DDC. As a final 
test, RSD (A) and DDC are applied to a 100-product problem 
totaling 330,000 nodes and 1,040,000 arcs. RSD (A) reaches 



an acceptable solution within 4% of optimality in under 17 
minutes, while DDC terminates after 68 minutes with a 12% 
gap remaining around the optimal solution. 
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I . 



INTRODUCTION 



Minimum-cost single-commodity capacitated transshipment 
problems are now solved routinely using primal network 
simplex codes. See, for instance, (Bradley, Brown, and 
Graves, 1977). However, when multiple products flow over 
the same network, the pure network structure can be 
confounded by the presence of constraints limiting the total 
flow of all products on each arc. Thus, single-commodity 
solvers may not be applied directly, and, as the number of 
products increases, the size of the constraint matrix grows 
so rapidly that conventional simplex solvers become useless. 

Because it is a frequently encountered problem, 
specialized algorithms for the multicommodity transshipment 
problem (MCTP) have been widely studied and reported in the 
literature. However, none of the methods has been so 
generally successful that it dominates other methods, as 
primal network algorithms do in the single-commodity arena. 

This paper documents new algorithms for the MCTP which 
fall into the broad categories of "resource-directive" and 
"price-directive." The first algorithm is an enhancement of 
a popular resource-directive approach using subgradient 
optimization, but incorporating a simplified primal 
projection together with conjugate subgradient directions in 
the Lagrangean lower bound steps. The algorithm promises 
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ease of computation in the primal restriction and improved 
criteria for termination over previous algorithms, but 
apparently shares a common problem of subgradient-based 
resource-direction: stalling in the primal before reaching 
optimality. 

The set of price-directive algorithms we present 
includes a variant of dual (Dantzig-Wolfe) decomposition and 
a new family of decomposition algorithms using a penalty 
transformation of the original MCTP to create a non-linear 
master problem which is solved by restricted simplicial 
decomposition. This is a unique approach to the MCTP not 
previously attempted which exhibits superior convergence 
rates in computational tests compared to other solution 
methods. The algorithm is quite general and therefore 
applicable to a wide variety of linear programs exhibiting 
complicating constraints. 

We introduce the MCTP in the following section, and then 
give an overview of major solution approaches to the MCTP in 
Section B, and a more detailed literature review in Section 
C. The resource-directive algorithm is presented in Chapter 
II and the price-directive algorithms in Chapter III. 
Computational results appear in Chapter IV and Chapter V 
presents conclusions and areas for future research. 
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A. STATEMENT OF THE PROBLEM 



The general form of linear program in which we are 
interested is: 



where one set of constraints, A 2 x = b 2 , 0 < x < b 3 is 
deemed to be easy to solve, and a second set, A^^x < bj_, 
complicates the problem. The vector u = (u 1 ,u 2 ) is the set 
of dual multipliers associated with the constraints with 
u x < 0 and u 2 unrestricted in sign. 

The following notation will be used throughout this 
presentation. We assume all products flow over the same 
underlying network. Let |Z| denote the cardinality of a set 
Z. 

T = {I,J} is a transshipment network with a set of nodes, 
I, and a set of arcs, J. 

P is the set of products flowing on T. 

i € I is a node of the network. 

j £ J is an arc in the network. 

p s P is a product using network T. 

Np is an ( | 1 1 X |J|) node-arc incidence matrix for each 
product (N 3 = ... = N i p | ) . 

N is an ( | I | * | P | ) x ( | J | • | P | ) matrix with the N p matrices 
along the diagonal, 0's elsewhere. 



(LP) 



min cx 



(duals) 



st A^x < b^ 
A 2 x = b 2 
0 < x < b 3 



(u x ) 

(u 2 ) 
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c = { c x , . . . , Cp, . . . , c | p | } is a vector of costs on the arcs, 
length | J | * | P | . 

x = {X]_, . . . ,x p , . . . ,Xi pi } is a vector of flows on the arcs, 
length | J] * | P | . 



b;L is a vector of joint capacities with length |J| 
b 2 = right hand side with 



b 2pi > 


0 if product p has 


a supply at node i, 




b 2pi < 


0 if product p has 


a demand at node i, 


and 


b 2pi = 


0 otherwise. 






A is a | J | 


x | J | * | P | matrix = 


(I,...,I). 




We specialize LP to MCTP by 


letting A 2 = N, b 3 = 


: b^, and 


dropping the 


subscript on A}/. 






(MCTP) 


min cx 


(duals) 


(1.1) 




st Ax < b-L 


(u x ) 


(1.2) 




Nx = b 2 


(u 2 ) 


(1.3) 




o < Xp < b x 


for all p e P . 


(1.4) 


The easy constraints are the 


single-product pure 


network 



flow constraints (1.3, 1.4), and the complicating constraints 
are the joint capacitation constraints (1.2). We assume for 
notational simplicity that all arcs have joint capacities, 
although in practice only a subset have such restrictions. 
For convenience, let 

F = {x|Nx = b 2 , 0 < Xp < b-L for all p e P} 

be the set of all feasible single-commodity flows with joint 
capacity constraints relaxed. In the absence of the 



11 



complicating these constraints (1.2), the MCTP decouples 
into a set of independent single commodity networks. 

The Lagrangean dual of MCTP with respect to the joint 
capacity constraints is found by placing these constraints 
in the objective with multipliers u^ < 0: 

(LR) max min L(u;l,x) = cx - u^Ax-b^ 

u 1 <0 x>0 

st x e F 

According to duality theory, if x* solves MCTP, then L^u^x) 
< cx* for u^ < 0 and x e F. Furthermore, a solution 

(u x **,x**) to LR has L(u^**,x**) = cx*. Thus, we may use LR 
to generate a lower bound on MCTP by fixing U;l < 0 and 
solving the following problem: 

( LR (u^ ) ) min cx - u^fAx-b^) 

x 

St X € F , 

and this bound will be tight if u^ is chosen correctly. 
Solving the Lagrangean dual generally allows some 

constraints to be violated with penalty u 1 (Ax-b^). For any 
x € F, we denote the set of violated constraints as 

Jy = (jlj € J > A j x > b lj> • (1-5) 
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B. OVERVIEW OF SOLUTION TECHNIQUES 

Solution methods for large-scale linear programs may be 
divided into three broad categories: direct factorization 

or compact inverse methods, indirect resource-directive 
methods, and indirect price-directive methods. Factoriza- 
tion approaches exploit specific structure inherent within 
the constraints to produce a compact basis representation 
with which the steps of a primal or dual simplex algorithm 
may be performed. Direct factorization is not pursued in 
this study (see, for instance, Graves and McBride, 1976) . 

The other two approaches, resource and price direction 
are decompositions which divide the original problem into a 
master problem and subproblems which exchange information to 
solve the original problem indirectly by iteration. 

The resource-directive approach solves the MCTP as a two 
part minimization: 

min min cx 
y>0 x>0 

st Nx = b 2 

x - y <0 
Ay = b-L 

yp < b;L for all p e P . 

The vector y allocates joint capacities to the individual 
commodities. For fixed y, the solution to the inner 
minimization is 
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(RS (y ) ) 



V(y) = min cx 



st Nx = b 2 
0 < x < y 

which is the "subproblem" or "subproblems" since the 
individual commodities are no longer coupled and may be 
solved independently. 

The outer minimization can now be written as 

(RD) min V(y) 

st Ay = b^ 

y p < hi for all p e P 

y > 0 . 

Standard methods for solving (RD) are Benders 

decomposition and subgradient optimization. Benders 

decomposition creates a master problem which makes 

successive tangential approximations to V(y); the tangent 
planes are derived by solving subproblems (RS(y)) whose 
capacity (resource) allocations are determined by the master 
problem. More details of Benders decomposition are given in 
Chapter III but the method is not pursued because the master 
problems become unwieldy. 

Subgradient optimization solves (RD) in a fashion which 
is analogous to a projected gradient algorithm which could 
be used if V(y) were differentiable. Given a feasible 
allocation y^ in the kth iteration, a feasible reallocation 
is obtained by 



14 



yk+1 _ yk + 



where is a scalar step length and d^ is a feasible 
direction. Since V(y) is not everywhere differentiable, d k 
is a projected subgradient rather than a projected gradient. 
Standard subgradient optimization does not use a line search 
to determine s^ since d^ is not guaranteed to be a descent 
direction. However, we devise an algorithm which performs 
an approximate line search when applicable in the lower 
bound routine. Subgradients are determined via solutions of 
the subproblems RS(y); the master problem of this procedure 
is the reallocation mechanism. 

We present price direction as a class of penalty 
problems , 



max min cx + P(z,Ax-b;L) (1.6) 

z>0 x 

st x £ F 

where P(*) is a penalty term involving those constraints 
deemed to be complicating. The vector z is determined 
differently for various forms of P(*), but in general is 
involved in constructing an optimal vector of dual 
multipliers . 

Letting P(u 1 ,Ax-b^) = -u^Ax-b^, we see that (1-6) 
becomes LR, -z = u^ estimates the optimal dual multipliers, 
and by fixing < 0, the inner minimization amounts to 

solving the Lagrangean subproblem LR(u 1 ) . 
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Solution of the Lagrangean dual, i.e., (1.6) with linear 

penalties, cannot guarantee an optimal solution to MCTP 
although optimal u^* and x* for MCTP are in the set of 
solutions to LR. Nevertheless, optimizing LR is useful for 
bounding purposes and it is sometimes possible to obtain 
good solutions to MCTP from LR. The method used in this 
work to solve LR is subgradient optimization. 

Subgradient optimization for LR simply updates the 
multiplier estimates at each iteration k, by the formula 

U! k+1 = min (OfU]^ - s k (Ax k -bi)) 

while controlling the scalar steplengths s k , resolving the 
Lagrangean to obtain a new x k+1 and seeking feasibility only 
indirectly via the penalty terms. The dual update mechanism 
is the master problem for LR while the subproblems are 
LR(u^ k ) . 

Standard dual (Dantzig-Wolfe) decomposition may be 
interpreted as a special method of solving (1.6) with 
piecewise-linear penalty function, 

P(z,Ax-b 1 ) = -z(Ax-b 1 ) 
and Zj = 00 if (Ax-b-^j > 0 

Zj = 0 if (Ax-b^) j < 0 

Zj [O, 00 ) if (Ax-b^) j = 0 . 

Z is determined by solving the master linear program of 

Chapter III. A, yielding the duals u 1 = -z. Subproblem 
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solutions attempt to provide descent directions for this 
penalty function. 

P(*) may also take standard non-linear forms such as the 
quadratic function, . 5h | | Q (x) | | 2 , where z is replaced here 
by the scalar h, Q(x) = (AjX-b^j) + = max ( 0 , Aj x-b^ j ) , and 
||*|| denotes the Euclidean norm. Penalty function theory 
tells us precisely how to solve this problem: solve the 

inner minimization as h 00 . As h 00 , x + x* and h(Ax-b 1 ) + 
converges to an optimal set of dual multipliers. 

However, starting with h large produces a highly ill- 
conditioned problem, so the problem is usually solved for a 
sequence of increasing values for h, producing a sequence of 
improving, but infeasible solutions. An augmented 

Lagrangean penalty function is investigated for reducing 
ill-conditioning and speeding convergence. 

A reasonable approach for solving these nonlinear 
penalty problems is some feasible descent method. For fixed 
h, feasible descent directions are derived for cx + P(h,Ax- 

/N 

b) at x = x by solving the linear subproblem 

/V A 

min V£(cx + P(h, Ax-b^) ) x 
st x £ F 

to obtain x. The Frank-Wolfe algorithm (1956) would perform 
a linear search in the direction x-x to obtain a new x and 
iterate. This type of master problem tends to have poor 
convergence in practice so we employ Restricted Simplicial 
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Decomposition (Hearn et al., 1984) which maintains a 
restricted set of extreme points of F together with x. 
Instead of solving a simple line search the master problem 
solves a nonlinear program over the convex hull of the 
retained points. The loss of simplicity in the master 
problem is typically offset by improved convergence of the 
overall algorithm. 

There is no strong rationale for replacing a hard linear 
problem with a seemingly more difficult nonlinear problem. 
Consequently, this approach has not previously been 
considered for large-scale applications. However, we show 
in Chapter III that this approach can be attractive. 
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C. SURVEY OF RELATED LITERATURE 

This section reviews the literature which has led to the 
current state of the art in algorithms for the MCTP. We 
begin with a brief overview of single-commodity network 
algorithms because of their computational importance in many 
algorithms for the MCTP. We then mention prior 
contributions to a few special cases of the MCTP, and 
finally review literature on algorithms used to solve 
general MCTPs. 

Single-commodity network flow problems have been widely 
studied since the 1940s, for two primary reasons: they are 
frequently encountered and their special structure lends 
itself to algorithms which are more efficient than general 
linear programming techniques. The constraint matrix of the 
pure network problem is a node-arc incidence matrix: all 
0's and ±l's, with at most one +1 and one -1 per column. 
This particular structure has three desirable properties: 
total unimodularity, which guarantees integer primal 
solutions given integer right-hand sides, and integer dual 
solutions given integer costs, a basis matrix which may be 
triangulated by permutation of rows and columns and thus 
simplifying computation, and primal extreme point solutions 
equivalent to spanning trees in the network. Several 
algorithms were developed in the 1950s and 1960s which made 
at least some use of these properties. Primal simplex 
methods were proposed by Dantzig (1963) and Fulkerson and 
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Dantzig (1955). Primal-dual methods became popular in 
practice at that time, including Kuhn's Hungarian Method 
(1955) and the out-of-kilter algorithm of Ford and Fulkerson 
(1962) . 

Primal-dual methods were favored throughout the 1960s, 
but some works, most notably the basis-labelling scheme of 
Johnson (1966), set the stage for research which led to the 
efficient primal network algorithms in use today. 
Subsequent research focused on efficient data structures and 
their manipulation, which led to compact basis 
representations and efficient performance of the simplex 
pivot. Significant research was done in the 1970s by 
Srinivasan and Thompson (1972,1973), Glover, Karney and 
Klingman (1974), Glover, Karney, Klingman and Napier (1974), 
Glover, Klingman and Stutz (1974), Barr, Glover, and 
Klingman (1979), and Bradley, Brown and Graves (1977). 
Results presented by these researchers demonstrated the 
primal network simplex to be up to two orders of magnitude 
faster, to require much less memory than general simplex 
solvers, and to be about 40% faster than out-of-kilter 
codes . 

This research has given us network codes, such as GNET 
(Bradley, Brown and Graves, 1977) , which can solve large 
(say, many thousands of nodes and arcs) single-commodity 
network problems very efficiently. This research is doubly 
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important for the MCTP since it allows efficient subproblem 
solution in many of the algorithms developed for the MCTP. 

Some of the pure network structure found in the single- 
commodity network problem carries over into the MCTP. 
Single-product networks do appear as blocks along the 
diagonal of the constraint matrix, but, unfortunately, the 
joint capacity constraints couple the networks together and 
generally destroy total unimodularity of the constraint 
matrix, admitting fractional solutions and requiring real 
arithmetic. It is appealing to try to restore total 
unimodularity or product independence by transformation. 



Some success in this 


area has 


been 


reported 


by 


Evans 


(1976, 1978a, 1978b, 1983) 


and by 


Evans, 


Jarvis 


and 


Duke 



(1977). The class of problems for which their methods work 
is quite restricted, however: total unimodularity is shown 
to hold when the number of sources or sinks for each product 
is less than or equal to 2, and the existence of 
transformations to single-product networks is shown when the 
number of sinks per product equals 2. Thus, this is of 
little help to the general MCTP. 

Algorithms for the general MCTP are broadly classified 
as direct methods which exploit special structure using the 
simplex algorithm or indirect methods which use some form of 
decomposition. The indirect methods include price-direction 
and resource-direction. 
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We first review the development of price-direction. 
Recognizing that the number of extreme points in a master 
problem may be huge. Ford and Fulkerson (1958) devised a 
procedure for the multicommodity maximal flow problem which 
uses column generation to produce favorable extreme points 
as needed. In their formulation of the problem, the 
variables correspond to chains or paths through the network, 
and the constraints represent individual arcs. Dantzig and 
Wolfe (1960) formalized the procedure into the dual 
decomposition procedure in which the master problem is 
solved to provide pricing information to the subproblems. 
Solving the subproblems produces an extreme point, which is 
added to the master problem if it is favorable, or indicates 
optimality of the current master problem solution if it is 
not . 

Tomlin (1966) first formulated and implemented dual 
decomposition for the minimum cost MCTP, showing the 
equivalence of the Ford-Fulkerson algorithm to the algorithm 
described by Dantzig and Wolfe. Others have developed 
extensions of the MCTP using the dual-decomposition 
approach. Cremeans, Smith and Tyndall (1970) and Wollmer 
(1972) formulated a model in which flow on some arcs depends 
on the availability of resources which are shared with other 
arcs. Weigel and Cremeans (1972) extended the model to 
allow the flow of each commodity to be measured in distinct 
units, joint arc capacities in common units, and to 
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incorporate node capacity constraints. Swoveland (1973) 
presented a generalization of the MCTP which involved a 
decomposition in which a single subproblem is solved in two 
stages. 

Considered an effective technique in early iterations, 
dual decomposition has a reputation for poor convergence, 
with progress toward optimality tailing off in later 
iterations. Ho (1984a) speculated that this failure to 
converge is due to numerical error in computer 
implementation. Ho and Loute (1983) compared solution times 
of several problems for a commercial linear programming 
package and decomposition codes. They concluded that 
standard LP is more efficient when it is practical, but 
pointed out that decomposition is nearly as efficient for 
the large problems tested. Ho (1984b) further speculates 
that efficiency of the decomposition increases with problem 
dimension. 

Simplicial decomposition, an indirect price-directive 
method applicable to non-linear quasi-convex programs was 
introduced by von Hohenbalken (1977). Convergence of a 
restricted version of simplicial decomposition was recently 
demonstrated by Hearn et al., (1984). 

We now turn to resource-direction, which solves the MCTP 
using the subproblem 
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RS (y) 



V(y) = min cx 



st Nx = b 2 
0 < x < y 



and the master problem 

RD min V (y) 

st Ay = b-L 

Yp — ^1 f° r P € P 

y > 0 . 



Because the master problem provides allocations feasible in 
MCTP , RS(y) produces an upper bound on MCTP. When RS(y) has 
a feasible solution, it is feasible in MCTP. It is a simple 
matter for the modeller to introduce explicit artificial and 
slack arcs with penalty costs to insure that any allocation 
will produce a feasible solution to MCTP. 

Resource-directive algorithms proceed by iteratively 
solving the restriction, and then updating the allocations 
in some favorable manner and resolving. Geoffrion (1970) 
provides a good overview of the resource-directive approach. 
The predominant method for computing new allocations is via 
subgradient directions, which are a generalization of the 
gradient for convex or concave functions with 
nondif ferentiable points. This method is applicable since 
V(y) is a piece-wise linear, convex function. 
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The subgradient approach appeared first for solving sets 
of linear inequalities (Agmon, 1954; Motzkin and Schoenberg, 
1954) , and then appeared in the Russian literature adapted 
to optimization problems, e.g. , Polyak (1967). The 
optimization version was first applied in the Western 
literature to the travelling salesman problem by Held and 
Karp (1970,1971), and further developed as a general method 
for optimization by Held, Wolfe and Crowder (1974). The 
convergence rate and stepsize considerations were studied by 
Goffin (1977) and Bazaraa and Sherali (1981). Fisher (1985) 
summarized the subgradient approach to solving the 
Lagrangean dual. 

Resource-directive algorithms for the MCTP using 
subgradients in the direction-finding process for optimizing 
RD were developed by Kennington and Shalaby (1977), 
Rosenthal (1983), and Allen (1985). Ali, Helgason, 
Kennington, and Lall (1980) declare their resource-directive 
algorithm to be faster than either a price-directive or a 
special basis factorization code in their computational 
tests . 

While subgradient resource-directive methods are 
considered exact (Kennington and Helgason, 1980) , zigzagging 
in subgradient methods is known to be a problem (Sandi, 
1979) , so convergence to optimality is frequently slow at 
best. Thus, the reported computational advantage of 
resource direction using subgradient updates is based on its 
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purported ability to reach an acceptable near-optimal point 
before simplex-based methods reach optimality (Ali et al., 
1978, 1980; Kennington and Shalaby, 1977). 

The final major category of algorithms for the MCTP is 
basis factorization or compact inverse methods. Algorithms 
in this category employ a primal or dual simplex approach, 
but exploit the special structure of the LP basis for the 
MCTP to reduce the size of the basis inverse that must be 
carried explicitly. A dual partitioning method was 
presented by Grigoriadis and White (1972) . Graves and 
McBride (1976) discussed the primal approach in the general 
context of mathematical programming. Hartman and Lasdon 
(1972) presented a theoretical primal approach based on the 
Generalized Upper Bounding (GUB) technique of Dantzig and 
Van Slyke (1967). Incorporating graph theory, they factored 
the MCTP basis into a network basis for each product and a 
"working basis" with dimension equal to the number of 
currently binding joint capacity constraints. Several 
efficiencies result. Primal network simplex techniques 
apply to computation on pure network bases. Only 
computations involving the working basis require real 
arithmetic, and this basis is generally quite small. 

Kennington (1977) implemented the method and further 
study has been done by Helgason and Kennington (1977). Ali, 
Barnett et al., (1984) concluded that the GUB approach is 
three to five times faster than standard LP codes, but other 
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studies generally rate resource-directive 
superior to the GUB approach (see, for instanc 
al., 1980, and Kennington and Shalaby, 1977 ). 



algorithms 

e f Ali, et 
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II . A RESOURCE-DIRECTIVE APPROACH TO THE MCTP 



This chapter presents a resource-directive method for 
solving the MCTP which incorporates a simplified projection 
mechanism in the primal subgradient capacity reallocations 
and conjugate subgradient directions with approximate line 
search in a Lagrangean lower-bounding routine. The 
procedure is based on the method of Allen (1985) which 
solves two essentially independent sequences of problems, 
the resource-directive sequence to generate feasible 
solutions and upper bounds, and a Lagrangean sequence to 
generate lower bounds. The purposes for these enhancements 
to Allen's procedure are to provide a computationally- 
supportable, theoretically sound projection in the 
reallocation routine, to provide a better mechanism for 
termination of the procedure, and to attempt to generate 
ascent directions in the Lagrangean routine to make line 
search worthwhile. We first restate the general form of the 
problem to be solved, and then introduce the concept and 
required theory of subgradient optimization. At that point 
we will be able to state clearly the shortcomings of 
previous approaches and present the procedure based on our 
improvements . 

The resource directive approach attempts to solve the 
problem 
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min min cx 

y x 

st Ax = bj_ 

Nx = b 2 
0 < y p < b x , 
x-y < 0 
x > 0 . 

By selecting a particular y e 
allocations satisfying (2.2) 
subproblem is 

V(y) = min cx 

st Nx = b 2 
x < y 
x > 0 , 



( 2 . 1 ) 

( 2 . 2 ) 

(2.3) 

for all p e P (2.4) 

(2.5) 

( 2 . 6 ) 

Y, i.e., a set of capacity 
and (2.4), the remaining 

(duals) 

(u 2 ) 

(u 3 ) 



which is a set of restricted, single-product transshipment 
problems. The original problem is then {min V(y) st y e Y), 
so the outer minimization forms a master problem which seeks 
improving capacity allocations. 

Before turning to the subgradient approach to solving 
this problem, we briefly mention another approach to this 
problem, "Benders decomposition." First, we make the 
simplifying assumption that all problems have finite 
feasible solutions and recall the theorem of duality to 
establish that the optimal values of a linear program and 
its dual are equal. Now the dual of our subproblem is 
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V(y) = max u 2 b 2 - u 3 y 



(2.7) 



st u 2 N - U 3 I < c 

u 2 free, u 3 > 0 , 

and letting the constraint set be represented by its extreme 
points, {u 2e }, {u 3e }, where e £ E, the index set of extreme 

points, it may be rewritten 

max u 2 e b 2 - u 3e y . 
e 

Substituting for V(y) in the master problem yields 

{min max(u 2 e b 2 - u 3 e y) st y e Y} , 
e 

or, equivalently, 

min z 

st z > u 2 e b 2 - u 3e y e e e 
A y = b 1 
0 < y < b>i . 

This is the Benders master problem. In practice, the 
extreme points of the subproblem are not all known, so we 
generate them by iteratively solving the subproblems to 
produce a new extreme point, and then add it to the master 
problem in the form of a constraint. The master, in turn, 
is solved to produce a new allocation. This is the process 
of the Benders (1962) decomposition algorithm which treats y 
as a complicating variable. A full presentation of this 
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method, which also considers conditions of unboundedness, 
may be found in Lasdon (1970) . The approach is not pursued 
here because of the large size of the Benders master 
problem. 

The subgradient approach attempts to solve the master 
problem by simple updates of the capacity allocations at 
each iteration, y k+1 = Pr[y k + s k d k ] , where Pr indicates a 
projection such that y k+1 e Y, d^ is a subgradient and s k is 
from a scalar step sequence, (s k ) satisfying 

I s k = «, s k > 0, and s k + 0 as k 00 . (2.8) 

Polyak (1967) demonstrated that these projected updates 

converge in the limit to an optimal solution simply by 

/\ 

assuring that (2.8) is met. Letting y k+1 = y k + s k d k , the 
projection finds a feasible y k+1 = argmin( | | y-y k+1 | | 2 st 
y e Y). 

The subgradient itself is a generalization of the 
gradient for a function with nondif ferentiable points. If f 
is a convex function with nondif ferentiable points defined 
on a feasible region F, then d is a subgradient of f at 
z £ F if 



f(z) > f( z ) + d* (z-z) for all z e F . 

The set of subgradients at a point is called the 
subdifferential, defined by 

9f(z) = (d|f(z) > f(zj + d'(z-z) for all z e F) ; 
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when z is a differentiable point of f, 3f(z) = 7f(z). 
Furthermore, there exists a set of subgradients which are 
the limits of the one-sided directional derivatives at z. 
These particular subgradients, referred to as "primitive 
subgradients," may be used in convex combinations to 
generate any member of the subdifferential. This 
presentation is made in greater detail by Rockafellar (1970) 
and Sandi ( 1979) . 

Demjanov (1968) demonstrated that at any nondif f erenti- 
able point of a convex function f, there exists a 
directional derivative -d m = f'(z) e 3f(z) which is the 
locally best direction of descent. That direction is the 
minimum norm of the subdifferential; that is, 

d m = argmin { | | g | | 2 st g e 3f(z)} 

which is the point of the subdifferential closest to the 
origin. Furthermore, a point z* is an unconstrained optimum 
if and only if -d m = f'(z*) = 0. 

Although using the minimum norm may be attractive as a 
descent direction, the required primitive subgradients are 
usually not all known; the usual approach is to find a 
single subgradient at each iteration and update the 
allocations using it along with a step sequence satisfying 
(2.8). The method works because any element of the current 
subdifferential forms an acute angle with the true direction 
to the optimal point, so by taking a step of the appropriate 
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length, we move pointwise closer to the optimal solution at 
each iteration. However, it is not necessary for the 
objective function value to improve at each iteration. The 
sequence achieves an optimal point when a zero-subgradient 
is encountered (Held, Wolfe, Crowder, (1974)). 

The method is attractive due to its extreme simplicity 
and is commonly used as a mechanism for multiplier 
adjustment in Lagrangean relaxations (e.g., Fisher, 1985) 
and for capacity reallocation in resource-directive 
algorithms for the MCTP (see Kennington and Shalaby (1977), 
Rosenthal (1983), or Allen (1985)). 

The choice of stepsize sequence is critical to the 
success of subgradient methods. For instance, the harmonic 
series, s k = 1/k, satisfies (2.8) but exhibits poor 
convergence in practice (Bazaraa and Sherali, 1981) . 
Convergence has also been demonstrated for 

s k = n(v*-f (z k ) ) / | | g k | | 2 

where 0 < n < 2 is a scalar multiplier, v* is the optimal 
objective function value, and g k is the norm of the 
current subgradient (Polyak (1967)). Since v* is generally 
not known, several researchers (for example, Kennington and 
Shalaby (1977) and Bazaraa and Sherali (1981) have replaced 
it by approximating values such as upper bounds or 
combinations of bounds. Goff in (1977) discusses a geometric 
stepsize progression which exhibits quadratic convergence, 
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but may not achieve an optimal point. Some good experiences 
have been reported for these methods, but these "heuristic" 
stepsizes frequently produce poor convergence, or, worse, 
may converge to non-optimal solutions. It is common 
practice to include in the process a heuristic rule for 
modifying the value of n when progress is slow. 

We now present the specific application of the 
subgradient approach to the MCTP, first to the resource 
direction routine in Section A, then to the Lagrangean 
routine in Section B, and finally present the overall 
procedure in Section C. 
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A. GENERATING UPPER BOUNDS VIA RESOURCE DIRECTION 

We use the subgradient approach in resource direction to 
perform the update, y k+1 = Pr[y k + s k d k ] for y k ,y k+ l e Y, to 
solve the problem min V(y) st y e Y where 

y 

V(y) = min cx 

st Nx = b 2 
x < y 
x > 0 . 

Notice that changes in capacity allocations act as 
parametric changes to the right-hand side of the subproblem, 
so V(y) is convex with respect to changes in y: a proof of 

this is given in Kennington and Helgason (1980). 

To identify an appropriate subgradient, we first recall 
(2.7). For any allocation y producing a feasible solution, 
we have V(y) = u 2 b 2 - u 3 y, where u 2 and u 3 solve the dual 
program. The following proof from Kennington and Helgason 
(1980) shows that -u 3 is a subgradient of the function V at 

y: 

Lemma 2.1: Let y > 0 be any feasible allocation to V(y) 

and (u 2 ,u 3 ) be the associated optimal dual 
solution. Then -u 3 is a subgradient of V at 

y- 

Proof: Let y, y e Y be any feasible allocations, with 

/S /S 

optimal dual solutions (u 2 ,u 3 ), (u 2 ,u 3 ) in V(y), 
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V(y), respectively. Then 

/N ^ ^ ^ 

V(y) - V (y) = u 2 b 2 - u 3 y - (u 2 b 2 - u 3 y) > 

u 2 b 2 - u 3 y - (u 2 b 2 -u 3 y) = -u 3 (y-y), or 

V(y) > V(y) - u 3 (y-y) so -u 3 is a subgradient of V 

at y. QED. 

Therefore, a subgradient is directly available from the 
subproblem at each iteration as the negative of the optimal 
dual u 3 , for allocation y. 

If -u 3 were a feasible direction, the subgradient 
reallocation would be y k+1 = y k -s k u 3 k . However, since -u 3 
generally yields infeasible allocations, we project the 
infeasible reallocation y = y k -s k u 3 k to a feasible 
allocation, y k+1 by solving the quadratic program 

min | | (y k+ 1 -y ) | | 2 
St y k+1 c Y 

An equivalent form of this problem is 

min i (y P j k+ 1 -y P j ) 2 st y k+1 e Y / 

P j 

which decomposes on j . Solving a quadratic program for each 
jointly constrained arc at each iteration is a burdensome 
task, so a heuristic projection is generally used which does 
not involve the quadratic functions. 
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Unfortunately, computational results of Allen (1985), 
for instance, indicate that the method is unable to obtain 
near-optimal solutions on even moderate-sized problems, even 
when the bulk of the solution time is spent on the primal 
reallocations . 

Two factors seem to contribute to this failure. First, 
the nature of the subgradient itself is that no primitive 
element of the subdifferential is guaranteed to be an 
improving direction. Second, the space of the primal 
variables is relatively large, complicating the problem. 

That is, if = (j|£ x p j * = b lj x * optimal in MCTP} , then 

P 

in applying the subgradient approach to the Lagrangean 
problem, we work with 0(|J T |) dual variables at each 
iteration; in the primal resource allocation, we make 
0 ( | J T | * | P | ) reallocations. Thus, as the number of products 
grows, the master problem becomes substantially harder. 

We implement a projection method which ignores 

0 < Yp < b^ and maintains the bounds externally by, in 
essence, a simple minimum ratio test. The resulting 
projection can be solved analytically, and applied in 
practice through a simple set of calculations. Therefore, 
although it does not relieve the previously mentioned 
shortcomings of the subgradient method, it does not add to 
them. Furthermore, it preserves theoretical convergence of 
the method for a stepsize sequence satisfying (2.8). 
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If we let y k and u 3 k be the allocation and dual 
variables obtained at iteration k, the infeasible 

A 

reallocation y and the feasible (projected) allocation 
y k+1 , then we may express the last two allocations as 

y = yk _ s k U;3 k 



and 



yk+1 




s k u 3 



k 

9 



The resulting quadratic program is 

min | | y k+ 1 -y | | 2 
st Ay k+1 = b 1 . 

Recognizing that y k+1 -y = -s k u 3 k + s k u 3 k , that Ay k = b^, and 
that s k is merely the scale factor, an equivalent problem is 

min | | u 3 k -u 3 k | | 2 
st Au 3 k = 0 . 

For a single constraint j , dropping the iteration count, 
the resulting problem is 

/\ 

min u 3 j*u 3 j - u 3 j 'u 3 j 
st 1 ' u 3 j = 0 . 

The Kuhn-Tucker conditions for this problem are 

A 

U 3 j - U 3j - 1-w = 0 (2.9) 
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where w is the dual variable associated with the constraint. 
Premultiplying by 1', recognizing that l'u 3 j = 0 and solving 
we find w = - ( 1 ' 1) -1 1 ' u 3 j . Substituting into (2.7) and 
rearranging, the familiar projection 

u 3j = (I - l(l , l)“ 1 l')u 3 j 

is obtained. Observing that (l'l) -1 = 1/|P| and rewriting 
to summation form, we see that the projected subgradient for 
the p-th product on arc j is 

u 3pj = t ( I P I ~1) u 3pj/ I P I 1 ~ T u 3p'j/l^ > l = u 3pj~ u 3j 

P^P 



where 



u 3 j = 1 u 3pj/l p l 
P 



The associated reallocation update is yp^ +1 = yp k - 

sk ( u 3 p-U 3 ), where s k is from an appropriate stepsize 



sequence , { s^} . 

Since the bounding constraints were ignored, ypj < 0 or 
y?j > Uj may result. In this situation we perform a minimum 
ratio test, setting 0 < sj ' < sj^ such that 0 < ypj < b^j 
for all p on arc j. We drop all products with Xpj = 0 and 
u 3p j < u 3 j on arc j , recomputing u 3 j and reallocating among 
the remaining products to simplify the process 

computationally. In addition, if some Xpij = bj with u 3 p»j 

= min(u 3 pj), no reallocation is performed for that arc. 

P 
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U 3 j for all p on some arc j , then no 



Finally, if u 3p j = 
reallocation occurs on that arc. 

We write the procedure as follows, given x^, u 3 ^ from 

the current subproblem, and a stepsize s^-: 

REALLOT: Compute u 3 j k = (AjU 3 )/|P| 

{For each arc, j: 

Cond 1: If there is a p' with x p i j = b^j 

and u 3p . j = min(u 3 p j) , 

P 

let y pj k+1 = y pj k 

Cond 2: If u 3p j = u 3 j for all p, 

let y p j k+1 = y p j k 

Cond 3: Identify P N = {p|x p j = 0,u 3p j > u 3 j } 

If P^j ^ p, recompute 

u 3 j = u 3pj/ ( I p I ” I P N I ) 

P/ p N 

Over all p f P N 

set ^pj^ +1 = ypj^ ” s j^( u 3pj” u 3 j ) 
such that 0 < Sj^ < s^ and 

0 < y p j ^ +1 < b^j for all p e P 

End for) 

Subgradient reallocation need only be performed for the 
set of joint arcs which are currently tight. For all other 
arcs, if there is some x p j = y p j with favorable reduced 
costs, but A j x < b^j for that j, we perform a simple 
reallocation in the manner of Rosenthal (1983), taking slack 
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from any p' with Xp ■ j < Yp ' j until it is all used. Thus, in 
the solution process, all available slack on a joint arc j 
is offered to each product until it is consumed, at which 
point that arc becomes a candidate for subgradient 
reallocation. 

In practice, we first perform simple reallocations until 
no further improved solutions are found, and then 
incorporate subgradient reallocations into the ensuing 
iterations for tight joint constraints. The set of tight 
constraints is updated with each pass through the 
subproblems. We note that if at any iteration k, there is 
no arc j to which a simple reallocation may be applied and 
all arcs with AjX = b]_j also have u 3 pj = u 3 j for all p, we 
have achieved an optimal point (Rosenthal, 1983) . 

Unfortunately, the optimality condition is not 
frequently achieved, nor is a useful lower bound available 
from the dual information of the restricted problems, so we 
establish lower bounds on MCTP through a Lagrangean 
relaxation routine in hopes of terminating through bound 
convergence. The Lagrangean approach is explained in the 
following section. 
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B. LAGRANGEAN RELAXATION USING CONJUGATE SUBGRADIENT 
DIRECTIONS 

Lagrangean relaxation is a frequently-used device to 
assist in solving linear programs in which a set of 
complicating constraints are placed into the objective 
function with penalty multipliers which are estimates of the 
optimal dual multipliers to the original problem. Recalling 
Chapter I, the problem becomes 

LR max min cx - u^Ax-b^ st x e F, 

Ul<0 x>0 

which is frequently solved by fixing Uj < 0 and solving 

LR (u^) V^) = mintc-u-LAJx + u^-l 

st x e F , 

to yield a lower bound on the original problem. A new u-l is 
then found and LR(u-l) resolved, repeating until optimality 
is achieved. 

A common method of accomplishing the u-l updates is by 
subgradient optimization, where g = (Ax-b^ is a subgradient 
and 



u^ 1 = min (0 , u^-s^g^) 

is the update. The "min" operator solves the projection 
min | | u 1 ^ :+1 -u 1 1 | 2 st u 1 ^ +1 < 0 (see for instance, Fisher, 

1985). However, the method has several drawbacks. First, 
due to the nature of the subgradient, the bound is not 
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always improved, and due to the sensitivity of the process 
to choice of s k , convergence may be slow, or the process may 
converge to a non-optimal point. Due to the relaxation, it 
is not necessary that a primal feasible solution ever be 
generated and, since we only determine one primitive 
subgradient at each iteration, we are unlikely to find a 0- 
subgradient and thus recognize an optimal solution. 

In this section we introduce a procedure which seeks to 
overcome some of these difficulties by retaining information 
on previous subgradient directions in order to approximate 
the subdifferential at the current point. The concept, 
developed originally by Wolfe (1975) involves computing a 
"conjugate subgradient" direction at each iteration which is 
the nearest point to the origin of a set of m retained 
subgradients. The method relies on the property that the 
subdifferential of any point can be arbitrarily well 
approximated by the convex hull of gradients of the points 
in its immediate neighborhood (see Rockafellar, 1970, for 
details) . The algorithm generates a sequence of points 
{u^} and subgradients {g^} and computes a search direction 
d k as the solution to 

(CD) min | | d | | 2 

m 

st £ w n g^~ n = d 
n=0 

m 

I W n = 1 

n=0 

w >0 for n = l,m 
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for some integer m. At some step of the algorithm, some 
subsequence of points, { u^^, Uj^ -1 , . . . , u 1 ^~ m } is close enough 
together that 

3f(z k ) = convtg^g^ -1 , . . . ,g k_m ) 

and 

d k i 0 . 

When this occurs, we are sufficiently close to the optimal 
solution to terminate with u^ as an approximation for u^*. 

Wolfe's algorithm calls for a single variable search to 
optimize the objective in the conjugate subgradient 
direction; this is computationally expensive since the inner 
minimization involves solving the set of network subproblems 
for various step lengths. So, we find s^ approximately. 
Define a nominal stepsize s n as 

S n = n (V-V) / | |g k | I 

where | |g k | j is the Euclidean norm of the current 
subgradient, and n is a scalar multiplier. V is the current 
upper bound, and V is the current lower bound. Then we 
evaluate W(n) = Vfu-j^ + s n d k ) f° r n = and 2 - Since the 

objective function value of the dual of a linear program is 
piecewise-linear and concave as a function of the multiplier 
values (Bazarra and Shetty (1979)) this surface can be 
approximated by fitting a quadratic interpolating function 



44 



through these three observations. Using the Newton-Gregory 
forward equation to interpolate on equally spaced data, we 
calculate 

W(n) = W(0) + nWl + .5n(n-l)W2 (2.10) 

for 0 < n < 2 , where 

W1 = W(l) - W(0) 



and 



W2 = W ( 2 ) - 2W ( 1) - W(0) . 

Details may be found in Gerald and Wheatley (1984), for 
example. To compute the appropriate step length, we select 

A ~ ~ 

n^ = argmax (W(n) |0 < n < 2) 

and set s^ = n^(V-V)/| |g^| | . The resulting multiplier 

update is u^ -1 " 1 = max (0 , u^+s^d^) , and we solve for V(u 1 ^ ;+1 ) 
to complete the approximate line search. 

Solving LR(Ui* :+1 ) provides a new subgradient g^ +1 = 

(Ax^ +1 -b 1 ) + which is added to the retained set. We are then 
ready to compute a new conjugate direction using (CD) . 

Since we are approximating the subdifferential, d^ will 
not always be an ascent direction. In that case, we simply 
take a small step in the direction d^, generate g^ +1 to 
improve our local approximation of the subdifferential, 
compute a new d^ +1 , and continue. 
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When d^ is near zero, we must insure that the sequence 

of u^ ' s generated is suitably close together to adequately 

approximate the subdifferential at u^. Wolfe suggests this 

m 

amounts to the check J | | U 2 k-n - u 1 k-n+1 | | < M for some 

n=l 

M > 0. If this condition holds, u^ is accepted as near 
optimal; if not, all retained subgradients are dropped and 
the process is restarted from u^ k . 

It is possible to produce feasible capacity allocations 
from Lagrangean solutions, which may be used as starting 
points for resource-directive procedures. These allocations 
must be integer to preserve integer arithmetic in the 
network subproblems. Given a current value for x, the 
capacity allocation y may be computed by 



(CA) 



Pr[x p j-(bij/2 x p j)] 
P 



if j € J v 



( 2 . 11 ) 



ypj 



P r [ x pj + ( x pj klj)/lpl] 

p 



if j l Jv 



where Pr[*] indicates a projection onto the set of integers 

satisfying ypj = Uj and ypj > 0, integer for all p and 

P 

j e J. In practice, simple rounding is sufficient. 

Note that the conjugate subgradient approach is also 
applicable to the resource-direction problem, but we chose 
not to use the conjugate approach due to excessive storage 
requirements. While for the Lagrangean we must store m 
subgradient vectors, in the resource-directive problem we 
must store m*|P| vectors. 
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C. A RESOURCE DIRECTIVE PROCEDURE WITH LAGRANGEAN 

LOWER BOUNDING 

We now present a procedure combining the resource 
direction of Section A with the Lagrangean routine presented 
in Section B for solving the MCTP. Upper bounds and 
feasible solutions are obtained via the resource direction, 
while lower bounds are obtained from the Lagrangean routine. 
Although we present a specific stepwise arrangement, the 
resource directive and Lagrangean routines are coupled only 
by the values of the bounds. They may be performed 
sequentially, iteratively, or in parallel, exchanging bound 
information when appropriate. 

Define the stopping criterion e > 0 as the allowable 

relative error between bounds, D > 0 as the acceptance 

criterion for a near-zero direction, and U > 0 as the 
allowable Euclidean separation of points used to approximate 
the current subdifferential. Let V and V be the current 
bounds and x be the incumbent solution. Also let K be the 
maximum allowable number of iterations. 

The Algorithm RDLB follows: 

Input: The network T = {I,J}, and joint capacity vector 

b]_, and for each product p = 1,...,|P|, a cost 

vector Cp, and a supply/demand vector b 2 p 
Output: Incumbent solution x, and incumbent value cx. 

step 0 (Initialize): Select e > 0, D > 0, U > 0, m > 1, 

K > 1 integer, set k=0, u°=0, J v = d 



47 



Solve LR ( 0 ) , compute J v = (j|AjX°-b^j > 0}, 
Evaluate V(0) obtaining x° 

If J v ^ 0, stop with x° optimal in MCTP 
Else , V = V(0) 

set d° = g° = (Ax°-b 1 ) + 

Set y° = CA(x°) 

Solve RS(y°) with Xy* optimal, set V = V(y° 

X = Xy* 

If (V - V)/V < e, exit. 

step 1: Solve Lagrangean 

la (Line Search) : Compute s = n(V-V)/| | g k | | , 

W(n) for n = 1,2 

Set n k = argmax{ W(n), 0 < n < 2} 
s k = n k (V-V)/ | | g k | | 
lb (Move to new point) : 

Set u x k+1 = max(0,u 1 k + s k d k ) 

Solve LRtU;^ 1 ) 

If V^u^* 1 ) > V, V = V(u 1 k+1 ) 
if (V-V) /V < e, exit with incumbent x 
Else, compute subgradient g k+1 where 

I AjX k -b 1 j for j e J v 

gjk+ i . 

( 0 otherwise 

set J v = J v j { j | A j x-b x j > 0) 
set G k+1 = (g k+1 gk-m+1 , 



solve 
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lc (Compute New Direction) : 



Let d 



k+1 



if i id k+i i i 



= argmin{ | | d | | 2 st d e conv(G k+1 ) } 
> D, set k = k+1, go to step 1. 

< D and £ | | u 1 k_n -u 1 k_n+1 | | < u, 



n=0 

k = 1, go to step 2 



< D and £ | | u 1 k_n -u 1 k_n+1 | | > u, 
n=0 

set d° = g k+1 , 



u l° = u l k ' k = 0, G° = d, 
go to step 1. 



step 2 (Resource Direction) : 

2a. Solve RS(y) for the current y k 
If V (y) < v, V = V (y) , x = x y * 

If (V-VJ/V < e, exit with x 
Else let J>p = ( j | AjXj^-b-^j = 0} 

2b. (Reallocate) 

For all j e J T , perform the procedure REALLOT 
to obtain y p j k+1 = Y p j k - Sj k (u 3p j k -u 3 j k ) 

If y p j k+1 = y p j k for all j e J T , exit 
If k > K, exit 
Else, go to step 21. 



Two aspects of this algorithm represent additional 
computational burdens over a standard subgradient approach 
which must be justified. First, the direction-finding step 
requires solving a quadratic program. This is a small 
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program of m variables which may be solved efficiently as a 
linear complementarity problem using the Kuhn-Tucker 
conditions (Bazaraa and Shetty, 1979) . Since the purpose of 
computing conjugate directions is to reduce the zigzagging 
common to subgradient algorithms, the potential for fewer 
iterations justifies the effort. 

Second, the line search requires two additional 
subproblem evaluations at each iteration. This means, in 
effect, that 3 normal subgradient steps can be taken for 
each conjugate step taken. It is not clear that this effort 
will always be justified. An alternate form of the 
conjugate subgradient algorithm which overcomes this problem 
first takes a series of subgradient steps, retaining the 
subgradients, and then periodically performs a conjugate 
subgradient step with quadratic approximation. This 
amortizes the cost of the search over several iterations. 
After each conjugate step, old subgradients are dropped and 
a new collection begins. Computational results presented in 
Chapter IV show that the conjugate directions method does 
generate a direction resulting in a large increase to the 
lower bound every few iterations, but is rather slow. 
However, because the entire procedure performs much worse 
than other algorithms tested, the alternate forms suggested 
above have not been tested. 
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Ill . PRICE DIRECTIVE DECOMPOSITION ALGORITHMS 



In this chapter we develop the theory of dual 
decomposition and introduce a new decomposition algorithm 
which solves a penalized version of the original linear 
problem. This is a procedure which uses restricted 

simplicial decomposition to construct a nonlinear master 
problem and conveys price information to the subproblems 
through the gradient of the objective function. With proper 
choice of penalty parameters, the subproblems quickly 
produce good Lagrangean lower bounds on the original 
problem, and improving primal solutions are produced by 
resource direction using the (infeasible) penalized master 
problem solutions. 

Both algorithms are applicable to general linear 
programs and their development here is accordingly general. 
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A. THE DUAL DECOMPOSITION ALGORITHM 

The standard approach in describing the dual 
decomposition algorithm is to put forth the idea of 
representing the feasible region of a subset of the 
constraints of an LP as a convex combination of its extreme 
points. This reduces the number of constraints, but 
introduces a large number of variables (the extreme points 
of the dualized constraints) , so the problem becomes one of 
finding the extreme points which, in convex combination, 
describe the optimal solution to the original problem. 
Dantzig-Wolfe decomposition combines this extreme point 
representation with column generation, which is the process 
of generating extreme points as required. 

If we let X be a matrix whose columns are the extreme 
points of F, then any x in F may be expressed as 

x=Xw, 1 • w = 1 , w > 0 , 

where l*w = 1, w > 0 enforce convex combinations of the 

extreme points. 

Substituting this form yields the Dantzig-Wolfe master 
problem, at the iteration 



min cX^w^ 






(duals) 


(3.1) 


st (Ax^)w^ 


< 


*1 


U 1 


(3.2) 












1 • w^ 


= 


1 


u o 


(3.3) 




> 


0 




(3.4) 
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where is a matrix of extreme points collected so far, and 
(3.3) and (3.4) are the convexity constraints. 

Each time the master problem is solved, we check to see 
if it is an optimal solution by computing the reduced costs 
for the non-basic extreme points of F. It is not necessary 
to check them all since the most negative reduced cost is 
found by solving the problem 

min (c-u^Jx - u Q (3.5) 

st x € F , 

which produces an extreme point of F. This is the Dantzig- 
Wolfe subproblem, which for MCTP is a set of single-product 
network problems which are easy to solve. 

If x^ solves the subproblem at the k th iteration and 
(c-u^AJx^-Uq < 0, then the reduced cost associated with x^ 
is favorable, so x^ is added to the collection of extreme 
points and we return to the master problem. If (c-u^A)x^- 
u° > 0 , then the reduced cost for x^ is unfavorable. Since, 
due to the minimization, it has the minimum reduced cost, 
then all non-basic extreme points of F have unfavorable 
reduced costs, and the process terminates with the solution 
to the previous master problem optimal in MCTP. 

A second approach may be taken to the dual decomposition 
algorithm which considers the decomposition as a cutting 
plane process (Kelley, 1960) , resulting in a master problem 
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which is dual to the standard master problem (e.g., Graves 
and Van Roy, 1979) . 

The problem to be solved is 



(P) 


min cx 


(dual variables) 




st Ax < bi 


(u x ) 




Nx = b 2 


(u 2 ) 




X 

IV 

o 





whose dual is 

(D) max ub = u^b^ + u 2 b 2 

u^A + u 2 N < c 
u x < 0, u 2 free . 

The feasible regions associated with the various constraints 
are 

F P1 = (x|Ax < b x } 

F P2 = (x|Nx = b 2 } 

Fp = F pl and F p2 and {x|x > 0} 

F^^ = {u|u^A + u 2 N < c} 

F D = F D1 and {u| Ul < 0}. 

The respective values of the primal and dual problems are 
written V(P) = cx and V(D) = ub. 

Let f(u,x) = cx - u^Ax-b^, which we recognize as the 
Lagrangean of (P) with respect to the constraint set A. 

Lemma 3.1: If x e F p2 , x > 0, ueF D , u 2 < 0, then 
f(u,x) > ub. 
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Proof: Write the identity: 

ub = cx - u^Ax-b^ - u 2 (Nx-b 2 ) - (c-u 1 A-u 2 N) x . 

Rearrange and substitute: 

ub = f(u x ,x) - u 2 (Nx-b 2 ) - (c-u 1 A-u 2 N) x. 

But the two right-most terms are < 0, giving the 

desired result. QED. 

This suggests that by fixing u lf the following 
subproblem results: 

(SUB(u-l)) min f(u 1 ,x) = Ujbj + (c-u-lA)x 

st Nx = b 2 (u 2 ) 

x > 0 

This is the Lagrangean relaxation of (P) with respect to 
A, with Uj fixed. If (SUB(u 1 )) is infeasible, so is (P) ; 
otherwise we generate \ jl ± using a master problem in a 
convergent algorithm. 

Let u k = (u 1 ^ c ,u 2 ^ c ) be a composite dual solution at step 

( 

k of the algorithm. The following lemmas describe its 
properties : 

Lemma 3.2: Let x k and u 2 k be respective optimal primal 

and dual solutions of (SUB(u-l) ) . Then u k b = 
fCu-^x*) = VCSUBCu-l)), and u k e F D . 
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Proof : 



Again, use the identity 

ub = cx-U! (Ax-b]^) -u 2 (Nx-b 2 ) - (c-u 1 A-u 2 N) x . 

So u k b = f (u^, x k ) -u 2 k (Nx k -b 2 ) - (c-u 1 k A-u 2 ^ c N) x k . 
Since u 2 k ,x k are optimal in (SUB(U;l)), by 

complementary slackness 
u 2 k (Nx k -b 2 ) = (c-u 1 ^A-u 2 ^ c N) x k = 0. 

(SUB(u-L k ) optimal implies 

(c-u 2 k N) -u^A > 0, and U]_ k < 0, yielding u k e Fq. 
QED 

According to Lemma 3.2, for any u^ k <0, a feasible 

solution u k = (u 1 k ,u 2 k ) e F D can be constructed with value 

V(SUB(u 1 )). Let u be the best solution to (D) currently 
known. The following lemma establishes the necessary 
conditions for improving the incumbent solution, u. 

Lemma 3.3: Let K = {k|x k € Fp 2 > > 0 } • A necessary 

condition for VCSUBCu^-l) to yield a dual 

solution value better than the incumbent value 

ub is that f(u,x k ) > ub+e, k e K, for some 
e > 0 . 

Proof: x k £ Fp 2 is feasible in (SUBCu^) ) for any u^, so 

VCSUBCUx)) < f(u,x k ). For VCSUBfUx)) > ub, u ± must 
satisfy f(u^,x k ) > ub+e for k <- K and e > 0. QED 

The existence of a convergent algorithm for finding u^* 
is established as follows. 
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Lemma 3.4: 



The sequence (SUB(u;l)) with arguments {u-^} is 
finite if {x e Fp 2 ,x — °) i s hounded, V(D) is 
finite, and at any step k, u^ satisfies 
f(u 1 ^ c ,x) > ub+e for 1 = l,...,(k-l) for some 

e > 0. 

Proof: SUBCu^) yields basic solutions, x^, which are 

finite in number. The lemma follows if any 
repeated basic solution yields termination of the 
sequence. By lemma 2, u k b = f(u;L k ,x k ) and u k e F D . 
If x k = x* for some 1 < k, then u k b = f(u^^,x^) = 

f(U;L k ,xl) > ub+e since u^ k satisfies f(u;L k ,x ) > 
ub+e for 1 = 1 . . . (k-1) . Thus u^ is a new 
incumbent, and with e > 0 and V(D) finite, there 

may only be a finite number of such updates. QED 

This criterion leads naturally to the master problem: 

f(u 1 ,x 1 ) e cx 1 -u 1 (Ax 1 -b 1 ) > ub+e, 1 = l,...,k 

i 

u x < 0 

Notice that the objective function is unspecified: any 

objective will suffice. Experience shows that the most 
recent cut works well in practice. That is, 

(MP (x) ) max f(U 2 ,x k ) 

st f(u 1 ,x-*-) = cx^- -u^Ax-L -b^) > ub+e, 

1 = l,...,(k-l), U]^ < 0 
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MP(x) may be unbounded, in which case any Uj^ < 0 such 
that f(u,x^) > ub+e, 1 = l,...,(k-l) will suffice. When 

V (MP (x) ) < ub+e, the process terminates with an e-optimal 

solution . 

According to the following lemma, a feasible primal 

solution to (P) can be recovered whenever V(MP(x)) is 

finite . 

Lemma 3.5: Let >0, 1 = k-1) be the optimal 



dual solution to MP(x) at iteration k. If 



V (MP (x) ) is finite, a primal feasible solution 



to (P) is 



k-1 



k-1 



[x k + l w 1 x 1 ]/[l + l w x ] 



1=1 1=1 



1=1 



Proof: MP ( x) is restated in canonical form as 



max -u^Ax^-b^ + cx k 

-U 1 (t>i-Ax^) < cx 1 -[ub+e], 1 = 1 



/ • • • / 



(k-1) (w x ) 



By duality, the optimal dual solution, w, satisfies 



k-1 

- (b 1 -Ax 1 )w^ < b;L-Ax k , yielding 



k-1 



k-1 




Therefore, x k e F P1 . Also x k is a convex 

combination of x^ c Fp 2 , so G F p* Q ED 
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Lemma 3.6: If V(MP(x)) < ub+e, then x and u are e-optimal 

primal and dual solutions to (P) . 

Proof: V (MP ( x) ) = f(u 1 k+1 ,x k ) = cx k -u 1 k+1 (Ax^-b^ (primal) 

= cx k + l [cx^-- (ub+e) jw^ (dual) 

= [cx k + (ub+e) ] [ 1 + £ W]J + (ub+e)] 

Rearranging yields 

cx k -(ub+e) = [ f (u;L k+1 , x k ) - (ub+ e)]/[l + I w^] and 
cx k < f(u 1 k+1 ,x k ) = V (MP (x) ) . If V (MP (x) ) < ub+e, 

then cx k < ub+e. If u* is the optimal solution of 
(D) , since x k e fp, and 

ub < u*b, u*b < cx k < ub+e < u*b+e. QED 

Finally, we show that once the master problem becomes 
bounded, it remains that way. 

Lemma 3.7: The value V(MP(x)) remains bounded once 

F(MP(x)) becomes bounded. 

Proof: V (MP (x) ) is finite if F(MP(x)) is bounded. Once 

F(MP(x)) becomes bounded, subsequent iterations add 
constraints and make existing constraints tighter 
by updating ub, further restricting the problem. 
QED 

The dual decomposition algorithm, DDC, based on the 
preceding theory follows. 
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Algorithm DDC: 

step 0: Specify u^ ' <0, e > 0, k = 1, ub = -°° 

step 1: Solve (SUB(u^ k )) for x^, U 2 ^ (i.e., f(u^ k / x k ) > 

ub+e,x k e Fp 2 ,x^ > 0) 

If infeasible, stop with (P) infeasible 

If u k b e (u 1 k ,u 2 k )b > ub, update incumbent 

solution. 

step 2: Solve (MP(x)) for u^^ +1 

I < ub+e, declare x k , u e-optimal for (P) , stop 
> ub+e and finite, x^ e Fp2, k = k+1, go to 

step 1 

->», use any u 1 k+1 < 0, ffu^+^x^) > ub+e, 

k = k+1, go to step 1 
is infeasible, STOP 

Since the master problem yields feasible primal 
solutions, it provides upper bounds, 

V (MP (x) ) > cx^ > u*b . 

Retaining all cuts may become too expensive in time or 

space in solving MP(x) . It is possible to conduct the 

algorithm as a heuristic by retaining only a fixed number of 
cuts. This may degrade the, quality of the solution 
obtainable at any iteration but does not prohibit 
convergence. A discussion of cut-dropping strategy is 
provided in the chapter on computational experience. 
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The value of e need not be fixed. A large value may be 
used initially which may be reduced as inconsisten-cies are 
encountered in MP(x) . Parametric analysis may be used to 
find a maximum e, or e may be reduced in a fixed algorithmic 
sequence of relaxations of cut aspirations. 

As a practical matter, the sequence of master problems 
behaves much like a first-order descent method in convex 
nonlinear programming. Each cut is a tangential 
approximation to the objective function as can be seen in 
Figure 3.1. Just as in nonlinear programming, we can expect 
oscillation as the solution sequence progresses. 




Figure 3.1 Tangential Approximation to Lagrangean 
Dual Function 
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In order to deal with such oscillation, a local 
neighborhood (or trust region) may be specified for u^. 
Initially, this neighborhood is relaxed to encompass values 
of u^ sure to contain the optimum (i.e., viewing u-^ as a 
penalty per unit of violation of joint capacity constraints, 
we wish to assure a feasible completion) . Subsequently, the 
trust region can be restricted when oscillation is apparent 
(e.g., whenever V(MP(x)) is non-monotonic improving). 

As an additional stabilizing influence, we introduce 
decomposition goals for the master problem variables. These 
goals may be violated at a small linear penalty cost. 

All the essentials for convergence are preserved as long 
as the cuts are satisfied hierarchically before the goals. 
In the same vein, the trust region and cut dropping 
heuristics do not present a serious impediment in practice. 
Brown, Graves, and Honczarenko (1983) developed similar 
mechanisms for primal goal decomposition of mixed integer 
models . 

DDC cuts can be restricted to Dantzig-Wolfe cuts if we 



force a 


maximal solution: 








DDC cuts 


cx^ - u^(Ax-^-b 1 ) 


> 


(ub+e) , 


1 = 1 , . . . , K j 


DDW cuts 


max u Q 








st 


cx^ - u 1 (Ax-^-b 1 ) 


> 


u o '+ U l b l' 


H 

II 

• 

• 

• 

* 
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In this sense the dual of the Dantzig-Wolfe master 
problem is equivalent to the DDW master problem. Taking the 
dual of (3.1) -(3. 4) yields 

max u 1 b 1 + u Q 
st u 1 (AX k ) + u Q < (cX k ) ' 
u x < 0 . 

Rearranging and writing each constraint individually, the 
result is u Q < cx^- - u^Ax^-) for 1 = l,k. Adding u 1 b 1 to 
each side results in 

u Q + u 1 b 1 < cx-*- - U]^ (Ax-*-- b^) for 1 = l,...,k . 

Before updating DDC to include all the above innovations, we 
introduce the following notation: 

X is the primal incumbent, 

X k is the matrix of extreme points generated up to 
iteration k, 

V is the upper bound, 

e,ef > 0 are initial and final convergence tolerances, 

R,Rf > 0 are initial and final trust regions, and 

n hold' n f — 1 • integer are number of cuts held, and 
total number of cuts allowed, and 

w e is exponential moving average of cut weights. 

The updated algorithm DDC1 becomes: 
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step 0: (initialize) 

Specify e > 0, ef > 0, R > 0, Rf > 0, n^oi^ > 1 
Set ub = -°° 



step 1: 


(solve aggregated problem) 

Solve min c fXf st NfX-L = b a , 0 < Xf < b 1 

where b a = ' b 2p 
P 

to find u±° < 0 

Set initial decomposition goals. 


step 2 : 


Solve (SUB (u-^) ) for x^, u 2 ^ 

Generate cut 

If u^b > ub, update incumbent solution. 


step 3 : 


(optimal capacity allocation) 

Set y k = CA(x k ) , solve RS(y k ) with Xy k optimal 
If V(y^) < V, update V, x 
Generate cut. 


step 4 : 


Solve (MP(x)) for u^ -1 " 1 

Update decomposition goals 

If V(MP(x)) < ub+e, set e = max(e/2,ef) 




If e > ef, repeat step 4 

If V (MP (x) ) not monotonic, set R = max(R/2,Rf) 
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step 5: If V (MP ( x) ) / > ub+e and finite, x k e F 



l 

i 



' ->oo, use u^ 4-1 < 0, f(u 1 ^ :+1 ,x 2 ) > ub+e 

( is infeasible, STOP 



Set cut weights w k+1 = . 8w k + . 2w k 

If solving relaxed master problem, recover primal 



solution (i.e., Xp^ = X^w^) 
If cXp k < V, update V, x. 



step 6: (termination tests) k = k+1 

If n < nf and ub < V-e go to step 2. 

Cut generation is performed as follows: 

Generate cut 
n = n+1 

if n > riftoia then 

locate slack cut with minimum weight and replace 
it; if no cut is slack, locate taut cut with 
minimum weight and replace at, also relax upper 
bound from recoverable primal solutions. 

Generate cut of desired class (e.g., Dantzig-Wolf e , 



DDC , . . . ) . 
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B. A PENALTY ALGORITHM FOR LINEAR PROGRAMS USING 

RESTRICTED SIMPLICIAL DECOMPOSITION 

Penalty functions have not been seriously considered as 
vehicles for large-scale mathematical programming in the 
past because they introduce nonlinearity (Geoffrion, 1970) . 
However, the recent advent of interior point or logarithmic 
barrier function methods has shown that non-linear 
approaches to linear programming are at least viable and 
offer attractive convergence rates and complexity properties 
compared with simplex-based methods (Karmarkar, 1984; Gill 
et al. 1986). In this section we develop the theory and 
present an algorithm based on penalty function concepts 
which is well-suited to large-scale optimization 
applications. The method has strong parallels to Dantzig- 
Wolfe decomposition, using restricted simplicial 
decomposition to produce a nonlinear master problem and 
linear subproblems. The process incorporates Lagrangean 
lower bounds in a natural way, but produces infeasible 
solutions in the master problem. We show how to use 
capacity allocation/restrictions on these infeasible 
solutions to produce improving upper bounds. The result is 
a convergent algorithm which displays a superior convergence 
rate in practice. To set the stage, we begin with a 
discussion of penalty function theory. 

Penalty algorithms approximate constrained optimization 
problems by unconstrained (or partially constrained) 
problems. This is done by placing the constraints into the 
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objective function with a penalty parameter which exacts a 
large price for any violation of the constraints. This 
relaxed approximation to the original problem becomes more 
accurate as the penalty parameter is increased. Thus, the 
penalty algorithm generates a sequence of infeasible points 
which converges in the limit to an optimal solution of the 
original problem. 

From the general form found in (1.6), we introduce the 
specific penalty form of a linear program, 

(PP) min q(h,x) = cx + P(h,x) 

st x e F 

where P(h,x) = — | |Q(x) | | | • | | t indicates the t th norm, 

Q (x) = (Ax-b 1 ) + , (Ax-b 1 ) + = max(0,Ax-b;L) , scalar h > 0, 

1 < t < 2, and F = (x|Nx = b 2 , 0 < x p < b^. T his is a 

common form found in any nonlinear programming text (see, 
for instance, Bazaraa and Shetty, 1979, or Luenberger, 
1984) . Recalling that J v is the set of currently violated 
constraints, the objective function may also be written as 

min q (h, x) = cx + J -Q-Wx)* 1 . (3.2) 

jeJ v 

Since the penalty terms are convex for h > 0, the 

objective function remains convex, and for t ^ 1, the 
objective function is differentiable everywhere. Convexity 
is proved for the quadratic penalty function in Appendix 
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A: convexity arguments for other forms may be found in 

Bertsekas (1982b) and Luenberger (1984), for example. 

Due to the convexity of q(h,x), we may employ the 
standard mathematical result that x is optimal in q(h,x) if 
and only if first-order stationary conditions are met: 

Vq(h,x) (x-x) > 0 (3.3) 

for all x £ F. For Vq(h,x)(x-x) < 0, (x-x) represents an 

improving direction. 

The following penalty function characteristics, stated 
in terms of PP, are taken from Luenberger (19 84) . Let {h k } 
and (x k ) be sequences of penalty parameters and associated 
optimal solutions, with h k+1 > h k , and h° > 0 . Then 

q(h k ,x k ) < q(h k+1 ,x k+1 ) 

P(x k ) > P(x k+1 ) 

cx k < cx k+1 



and 



cx* > q(h k ,x k ) > cx k , 

where x* is optimal in MCTP. Thus, solving the penalty 
problem for any h = h k > 0 produces a lower bound on MCTP 
and 



lim q(h k ,x k ) = cx* with x k £ x* . 

k-KX. 
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This large-scale nonlinear version of the original 
linear problem is useless without an effective means of 
solution. We base our solution algorithm on the promising 
method of restricted simplicial decomposition. 

Restricted simplicial decomposition (RSD) was developed 
by Hearn et al. (1984) to solve large-scale pseudo-convex 
nonlinear optimization problems. The method decomposes the 
original problem into a nonlinear master problem and a set 
of linear subproblems (assuming linear constraints) . At 
each iteration, the subproblems generate a new extreme point 
of the feasible region for the master problem, and the 
master problem minimizes the original objective function 
over a simplex of retained extreme points. The master 

problem in turn provides new cost information to the 

/\ 

subproblems via Vf(x) where f is the objective function of 

/\ 

the master problem and x is the current solution. The 
process terminates when no favorable extreme points remain. 
It is termed "restricted" since only a fixed number r of 
extreme points are retained at each iteration. 

When r = 1, RSD specializes to the algorithm of Frank 
and Wolfe, 1956. In this case the master problem becomes a 
minimization on the line joining the current point x and the 
extreme point just generated. Many researchers have noted 
the slow convergence of the Frank-Wolfe algorithm (Ali, et 
al., 1978; Meyer, 1974; Wolfe, 1970); a simple example taken 
from Wolfe (1970) makes this evident. Suppose we are trying 
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to find the point of a polytope which is closest to the 
origin: in Figure 3.1, this is the midpoint of the base of 
the triangle. At each iteration, linearizing the objective 
function and solving the resulting LP leads to vertex B or C 
of the triangle. Thus, as the algorithm progresses, the 
line search proceeds along a direction nearly orthogonal to 
the gradient Vf(x^) causing zigzagging. However, simply by 
retaining two extreme points and optimizing over the simplex 
formed by the extreme points and the current iterate, the 
problem in Figure 3.2 is solved optimally on the second 
iteration. Indeed, Hearn's computational results show 
significant improvement as r is increased from 1 (Hearn, et 
al., 1984). 



A 




Figure 3.2 Frank-Wolfe Zigzagging Example 
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Using this method on PP decomposes the problem into a 
master problem, which minimizes q(h,x) over the set a 
retained extreme points, and a subproblem consisting of a 
set of decoupled network flow problems with modified prices. 
The master problem is simple to solve since it has only a 
convexity constraint like (3.3) and nonnegativity 

constraints on a number of variables only equal to the 
number of retained extreme points. Furthermore, Hearn shows 
that the method is convergent even if the master problem is 
only solved approximately. 

The decomposition of the penalized MCTP is formed in the 
following manner. Let X be a matrix whose columns are 

retained extreme points of F, and r be the number of points 
retained. Then the master problem becomes 

(MP) min q(h k ,X k w) = cX k w + P(h k ,X k w) 

st: 1 * w = 1 

w > 0 . 

This resembles the Dantzig-Wolf e master problem, except that 

the penalized joint capacity constraints now appear in the 

~ r 

objective function. Let x k = £ x n k w n = be the optimal 

n=l 

solution to MP at iteration k, and J v be the set of 
constraints violated at x k . The corresponding subproblem is 

(SP) min q(h k ,x k )x (3.4) 

st x e F (3.5) 
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where 



yq(h^,x^) = c+VP(h^,x^) = c+h^Q (x^) "Q (x^) = c+h^ (Ax^-b) + A . 

Comparing this form to the objective of the Dantzig-Wolf e 
subproblem (3.5) reveals that -h^(Ax^ c -b 1 ) + is an estimate 
for the dual multipliers, u^. Thus, we make a new estimate 
of the optimal multipliers based on the violations in the 
current master problem. To avoid notational confusion, we 

/N ^ 

let u = h^ (Ax^-b 1 ) + in the rest of this section. Now, if 
x^ solves SP and 

A A 

Vq (h^, x^) (x^-x^) > 0 , 

/\ A 

then x^ solves PP for h = h^; otherwise (x^-x^) is a 

favorable direction, so we add x^ to the retained extreme 
points and return to the master problem. 

If x^ is optimal in PP for h = h k and x* is optimal in 
MCTP , then 

A 

q(h^,x^) < cx* , providing a lower bound on MCTP. 

A 

However, since x k is probably not an extreme point of F, we 

/s 

must identify the facet of F containing x^ and solve MP 
exactly to establish this bound. We use other information 
to provide intermediate bounds. 

Recall that the Lagrangean relaxation of the linear 
program, with u^ fixed, may be written 
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LR( Ul ) 



min (c - u-lA)x + u^! 
st x e F 



Using the set Jy, associated with any solution to the master 
problem, we let 






u 






| 0 if j / Jy 

I h(AjX-b X j ) t_1 if j e Jy , 



and rewrite SP as 



(3.9) 



min (c + u^A) x 
st x e F . 



Thus, we observe that 

A A 

V(u;l) = Vq(h k ,x k )x k - u^b^ , (3.10) 

so at each iteration we compute u^b^ as we compute new 
subproblem costs and adjust the global lower bound whenever 
(3.10) exceeds the current lower bound. 

Since the penalized problem is convex, it is also 

A 

possible to linearize the objective function at x^ to 
establish lower bounds via 

/N A A 

q(h^,x^) + Vq(h^,x^) (x-x^) < cx* . 

However, the following lemma establishes that the 

linearization is always dominated by the value of the 

^ /N 

Lagrangean relaxation with u ^ = h ( (Ax^-b^) + ) . For ease 
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of presentation here, we use the non-standard vector form 
[ (Ax k -b 1 ) + ] t , meaning [max ( 0 , AjX^^-b! j ) ] t for each element in 
the vector, and we drop k from x^ and u-^. 

A A 

Lemma 3.8: Let = h{ (Ax-b^ + ] t_1 at point x e F. Then 

AAA 

q(h,x) + V q (h, x) (x k -x) < Y(ui) < cx*, where 
x^ = argmin {Vq(h,x^)x st x e F} 

Proof: For u 1 = h{ (Ax-b^ + ] t_1 , vq(h,x) = (c+u^) . Now 

^ _ /N /N XX 

q (h , x) + v q(h, x) (x K -x) = cx + P(h,x) + cx k + 

VP(h,x)x k cx P(h,x)x = cx k + P(h,x) + 

A - A A A A 

■ P (h , x) (x*-x) = (c+u^AJx* + P(h,x) - VP(h,x)x since 

VP(h,x) = h[ (Ax-b^^) + ] t-1 A = u^A. By definition, 

V(u- L j = (c+u^Jx* - u 1 b^ < cx , so our result is 

A A A A A A 

proved if VP(h,x)x - P(h,x) > u^-^ Now VP(h,x)x 

- P (h , x) = u-lAx - ^h( [ (Ax-b) + ] t_1 ) ' (Ax-b) + = UqAx- 

A A ■ /\ /S 

^ u 1 (Ax-b 1 ) + . But, since u^ = 0 whenever AjX-b-Lj 

< 0, we see that 

A /X 1 A /X I "f“ 1 A A 1 A 

u^Ax - — Uq^ (Ax-b^) * = n(— — ) u^Ax + — u^b-^ . 

Since u^^AjX > u-^jbj for all j, then ( ^-) u^Ax + 

Ulbi > u^b^. QED 

The (relaxed) penalty form of MCTP produces a sequence 
of infeasible x's which is only guaranteed to converge to x* 
in the limit. Therefore, the penalty algorithm does not 
naturally produce intermediate feasible solutions or upper 
bounds. In general, this is unacceptable, so we rely on the 



improving nature of the sequence of x's, using projection 
and restriction to generate intermediate feasible solutions 
and upper bounds for MCTP. 

Suppose h is fixed and x solves min q(h,x) for x e F. 
Then, if x k solves the master problem at iteration k, either 
x k = x or a favorable extreme point is added to the master 
problem with 

q(h,x k+1 ) < q(h,x k ) . 

/s 

If x k = x, increasing h to h' > h and resolving MP over the 

same extreme points yields 

/\ ^ 

q(h,x k ) < q(h',x k+1 ) ' 

A « , 

By sampling the sequence periodically as x K x and forming 
a capacity allocation according to (2.11), y k = CA(x k ) (with 
scaling to allow non-integer solutions) , we will form 
allocations in RS(y) which improve as x improves, thus 
obtaining improving feasible solutions and, therefore, upper 
bounds. These resource allocations may be performed 
whenever desired; we choose to do an allocation every r th 
iteration where r is the number of retained extreme points. 

The following lemma shows that the allocations must 
eventually converge to an optimal allocation. 

Lemma 3.9. Let (x k ) be a sequence solving MP as {h k } -> co, 
y k = CA(x k ), and Xy k solve RS(y k ). Assuming 
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Proof : 



the solutions are scaled to provide integer 
allocations, as h k co , V(y k ) -> V* and 
x y k - x*. 

lim x k = x* for the penalty problem, where cx* is 
optimal in MCTP (Luenberger, 1984, Chapter 12). 
Since y k = CA(x k ) , as h ■+ 00 , y* = lim y k ^ CA(x*) . 
Since x* is feasible, Yp j * > x pj* for all P and j. 
Thus V(y*) = cXy* < cx* . But Xy* is feasible in 
MCTP by construction, so CXy* > cx*. Therefore 
CXy* = cx*, and Xy* solves MCTP. QED 

The algorithm proceeds by iteratively solving the 
subproblems and master problem, and periodically solving the 
restriction to generate new feasible solutions. The value 

of h is increased whenever 

A /\ 

1) Vq(h,x k ) (x k -x k ) > 0, 

2) RS(y) will be solved in the current iteration, or 

A 

3) min q(h,x k ) < V. 

The algorithm terminates when a feasible solution is 
generated in the master problem, since it must be optimal, 
or when 

(V - V)/V < e 

for some e > 0. 

The algorithm uses the following additional notation: 



76 



matrix of retained extreme points at iteration 



x k = current master problem solution 

x M = previous MP solution considered in current MP 

A 

X k = a matrix formed by augmenting X k with x M , written 
X k u x M 

conv(X) = convex hull of X 



J v = set of capacities violated by the k tk solution to 
MP 



a > 

x = 

t 

r > 

e > 

CA (x) 

v, v = 



1.0, a scalar multiplier for increasing h 

current incumbent for MCTP 

power of the penalty function, 1 < t < 2 

1, integer; maximum number of retained extreme 
points 

0 , a stopping parameter 

= a capacity allocation based on x using equation 

( 2 . 11 ) 

current upper and lower bounds on MCTP. 



The Algorithm RSD(P) : 

Input: The network T = (I,J), and joint capacity vector, b-^ 

and, for products p=l,...,|P|, a cost vector, c p , 
and supply/demand vector, b 2 

Output: Incumbent solution x, and incumbent value cx. 



step 0 (Initialize) : Select h° > 0, e > 0, a > 1, r > 1, 

set k = 0 

Solve LR ( 0) , with x° optimal; set V = V(0) 

Form J v ° = { j | Aj x° - b^j > 0} 

/A 

If Jy° = $ , stop with x° optimal 
Else, set y = CA(x°) 

Solve RS(y), with Xy° optimal; set V = V(y). 
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If (V - V)/V < e stop with incumbent x = Xy° 

Else, set X° = d, x M = x° 

step 1 (Solve Subproblem) 

Let x k = argmin{V q(h k , x k ) x | x e F} 

where 

Vq(h k ,x k ) = (c+u^^A) for u ^ = h ( (Ax^-b^) + ) t_1 
If q(h k ,x k ) (x k -x k ) >0, x k solves MCTPP for h = h k 
set V = q(h k ,x k ) 

If (V - V) /V < e, exit with incumbent x 
Else h k = a’h k 
Go to step 2 . 

Else compute = Vq(h k ,X* )x k - u^^b^ 

If V(u x ) > V, V = V(U X ) 

If (V - V)/V < e, exit with incumbent x 
Else 

i) If |X k | < r, X k+1 = X k u x k 

ii) If | X k | = r, drop the column of X k which had 

the smallest w n in convex combination forming 
x k and replace it with x k to form X k+1 

x M = x k 

A 

Set X k+1 = X k+1 u x M 
Step 2 (Solve Master Problem) 

A 

/V 

x k+1 = argmin (q(h k ,x), x e conv(X k )} 
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Discard all columns of X k+1 with w n = 0 
Form the set J v k+1 = {j|AjX k+1 - b^j > 0} 

If k is an integer multiple of r, do a resource 
allocation: 

Set y = CA(x k+1 ) 

Solve RS(y), with Xy k+1 optimal 
If V (y) < V, set V = V (y) , x = x y k+1 
If (V - V)/V < e, exit with incumbent x 
Else k = k+1 
Go to step 1. 

One possible difficulty with the penalty form of the 
objective function is that only the prices associated with 
the currently violated set of joint capacity constraints are 
adjusted at each iteration. There is no intermediate 
adjustment of multiplier values for constraints previously 
but not currently violated. Furthermore, convergence to 
optimal multiplier values for MCTP is guaranteed only in the 
limit, that is, as h + 00 . These potential problems are 

overcome by the augmented Lagrangean form of MCTP. As the 
name suggests, the objective function is formed by adding a 
penalty term to the standard Lagrangean dual form (Hestenes, 
1975) . Since the theory of this augmentation is developed 
in terms of equality constraints, we express the joint 
capacitation constraints of MCTP as 

(AjX-b'^j) + mj 2 = 0 for j e J . 
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The augmented Lagrangean problem is then 



min L(x,u 1 ,h) 
m, x 

= cx + l [u X j{ (Aj x -b 1 j) + mj 2 }+ih|AjX-b 1 j+mj 2 | 2 ] , x € F 
j e J z 

A 

with U]_, h > 0 . 

Bertsekas (1982b) demonstrates that an equivalent 
objective not requiring the variables m is 

L(x,U;]_,h) = cx + l { [max(0,u 1 j+h(AjX-b 1 j ) ) ] 2 -u 1 j 2 } . 

jeJ 



This amounts to a multiplier update 

^ - 

u ^" 1 " 1 = max{0,uK +h(AjX-b 1 j)} for all j e J . 
Using vector notation, the problem is stated as 



min L(x,u-L,h) = cx + P(x,u 1 ,h) 
st x e F 

A 1 ^ A 

where P(x,u-L,h) = 2h (|| u 1 '|| 2 - IIU 3 JI 2 ) 

/\ /v 

and U]^' = [U 2 + h(Ax-b 1 )] + . 

By analogy to the penalty version of the algorithm it is 

/\ /s 

easy to see that the subproblem, min V L(x,u lf h)x for x e F 

A 

may be restated as min(c+u 1 A)x, st. x c F, and a lower bound 

A A A 

on MCTP is derived by computing min VlCx^u-l,!!) x^-u^b-^. 

Bertsekas points out certain advantages to solving this 
augmented Lagrangean form of the problem. First, the method 
will converge to an optimal solution for a finite value 
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of h. Second, the rate of convergence of this augmented 
form is superior to the penalty form, which depends on the 
rate of increase of h, thus substituting convergence rate 
for ill-conditioning concerns in the master problem. 

We conclude this section by demonstrating that RSD(P) is 
a convergent algorithm. Hearn, Lawphongpanich and Ventura 
(1985) present a proof that RSD either terminates in a 
finite number of iterations or generates a sequence {x k } 
for which any subsequential limit is a solution. For the 
penalized version of the MCTP the proof may be simplified, 
relying on the simple assumption of a bounded feasible 
region (i.e., 0<Xp<b 1 <°°. 

In preparation for the main result, we first show that 
solving the master problem for fixed h produces a sequence 
of improving solutions (Hearn, et al, 1985) . 

A 

Lemma 3.10: Let x k solve the restricted master problem at 

iteration k, with h = h. If x k is not 

^ ^ 

optimal in PP, then q(h , x k+1 ) < q(h , x k ) . 

Proof: At iteration k, x k is the current iterate and x k 

/\ 

solves the subproblem with \q(h, x k ) (x k -x k ) < 0. 

Thus x k e X k+1 and X k+1 = X k+1 u x k . Now let 

x k+1 = argmin{q(h, x) s.t. x e conv(X k+1 ) ) . Then, 

q(h,x k+1 ) < q(h,x k ) since x k e X k+1 . Now, if 

— /\ A ^ 

q(h,x k+1 ) = q(h,x k ), then x k also minimizes the 
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master problem at iteration k+1 and Vq (h, x k ) (x-x k ) 

> 0 for all x e X k+1 . But since x k e X k+1 , this 

t /\ 
contradicts the assumption that (x k -x k ) is an 



improving direction. QED 



The main convergence result is stated as follows: 



Lemma 3.11: Let F be a bounded, non-empty set of feasible 

single commodity network flows. The RSD(P) 
algorithm either solves PP for h = h in a 



finite number of iterations or converges to 
an optimal solution x as the limit of an 



infinite sequence. Furthermore, as h is 
increased to °° , q(h,x) ->• cx* and {x} x*, 
where x* is an optimal solution to MCTP. 



Proof: Let x solve min q(h,x) st x e F. Since F is non- 



empty and bounded, choose any starting point 
x° e F, giving q(h,x) < q(h,x°) < °°. Then at any 

A 

iteration either x k = x or 



q(h,x) < q(h,x k+1 ) < q(h,x k ) < q(h,x°) . 

oo 



Letting d k = q(h,x k ) - q(h,x k+1 ) , then d k = 

k=0 

q(h,x°) - q(h,x), so lim d k = 0. Since only x 

k ~’’°° 

varies, lim |x k -x k+1 | = 0 and, since lim q(h,x k ) = 

k+oo _ * k->°° 

q(h,x), then {x K } x. 

A - — 

Since x <r F, we set h to h > h and continue 



the algorithm. By Lemmas 1 and 2, Section 12.1 of 
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/N /S 



Luenberger (1984), q(h,x) < q(h,x) < cx*. Finally 
by Section 12.1, Theorem 1 of Luenberger (1984) 
letting h ^ 00 , we have lim q(h,x) = cx* and 

v 

(x) -*■ x*. QED 

It is not necessary to solve q(h,x) to optimality for 
each value of h to obtain convergence since all extreme 
points of the current simplex are elements of F. Thus, 

/s — 

whenever it is desirable, we may set a new h > h and resolve 

A 

the master problem with new objective function min q(h,x) 

/\ 

over the same simplex to obtain a new x^. The subproblem 

/N /\ 

costs are then based on Vq(h,x^) . Appropriate opportunities 
to increase h have already been discussed. 

The preceding proof demonstrates that the algorithm is 
theoretically convergent without resource directive capacity 
allocations. Indeed, Bertsekas has shown that, for proper 
choice of sequence {h}, penalty algorithms may be 
quadratically convergent (1982b) . However, in practice, 
awaiting convergence for large problems may be impractical. 
Therefore, we use the resource allocation steps to help 
obtain near-optimality quickly. As with the resource- 
directive approach discussed in Chapter II, we always employ 
simple reallocation to obtain the best available solutions. 
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IV. COMPUTATIONAL EXPERIENCE 



The algorithms presented in Chapters II and III have 
been coded in FORTRAN and tested on various versions of a 
large scale MCTP using an IBM 3033AP under both VM and MVS 
operating systems. For comparison, some four and ten 
commodity problems have been solved using the X-system of 
Brown and Graves (1974), which exploits the generalized 
upper bound structure of the complicating constraints in the 
MCTP and employs advanced starting solutions of the network 
constraints. These control problems were solved using an 
IBM 3081K, which is approximately twice as fast as the 
3033AP for the X-system. 

Throughout this chapter the following acronyms are used: 
DDC = dual decomposition, 

RDLB = resource direction with Lagrangean lower bounds, 
RSD = restricted simplicial decomposition, 

RSD(A) = RSD of the augmented Lagrangean form, and 
RSD ( P) = RSD of the penalty form. 

The particular test problems used are described in Sec- 
tion A, issues concerning individual procedures are studied 
in Section B, and comparisons among algorithms are presented 
in Section C. Whenever the word "gap” is used to describe 
the quality of a solution, it means (V-V)/V*, where V, V, 
and V* are the upper and lower bounds, and optimal solution. 
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A. TEST PROBLEMS 



We demonstrate our methods on a transshipment model for 
delivery over time of military products from production and 
storage locations to overseas locations to support theatre 
operations. The model covers five physical echelons, 
including production plants, storage depots, ports of 
embarkation, ports of debarkation, and geographic field 
locations. Road, rail, sea, and air transportation are 
modelled, and product demands are time-phased. Capacitation 
occurs primarily on sea and air links, and as throughput 
capacities on transfer points, requiring replication of some 
echelons . 

The objective of the model is to minimize deviation from 
on-time deliveries. This is accomplished through a 
specified set of backlogging arcs with graduated penalties 
and a system of penalized "artificial" arcs which direct 
flow around the network to satisfy "undeliverable" demands 
and to equilibrate supply and demand. The products all use 
a common unit of flow and incur a common cost on each arc. 

The network is abstracted in Figure 4.1 — a more detailed 
description of the model may be found in Staniec (1984). 
Computational tests use an underlying network of 3,300 nodes 
and 10,400 arcs, of which 1,071 are subject to non-redundant 
joint capacity constraints. A set of basic test problems of 
between four and ten products is used for detailed inter- 
and intra-method testing. Results of these tests are 
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reported together in the following sections. Three 
competitive methods are tested on a final 100 commodity test 
problem in the last section to illustrate the effectiveness 
of these methods on a typical, large-scale problem. 



AMMO 

PLANTS 



DEPOT 



DEPOT 



POC 



POE' 



POO 



POO 



GLOC 



GLOC 




Figure 4.1 Simplified Ammunition Distribution Model 



The scope of our computational study is limited to 
problems drawn from this one generic model, but we think 
that these problems are generally quite difficult to solve. 
Because of the explicit system of. penalized artificials, the 
costs range over five orders of magnitude, which have the 
potential to cause difficulty in real arithmetic. Also, 
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because the model has multiple shipping modes, covers nine 
spatial echelons, 21 time echelons, and has a large 
backlogging system, repair of joint capacity violations 
requires extensive effort. Therefore, these test problems 
present a rigorous challenge for the solution algorithms. 

We have graded the basic test problems used as easy (E) 
or hard (H) based on two criteria. The first criterion is 
the number of violations encountered in the first 
relaxation. This parallels the criterion of Ali, et al. 
(1984) in evaluating direct simplex methods used on some 
medium-sized MCTPs: the number of tight joint capacity 
constraints. The second criterion is the magnitude of the 
initial violations, which is an indirect measure of the flow 
changes required to reach the optimal solution. Together 
these criteria affect the size of the gap between the 
initial finite upper and lower bounds. This is evident in 
the RSD and RDLB procedures, where the initial feasible 
solutions are based upon allocations from the first 
relaxation: the larger the violations, the poorer the 
initial allocations. Among the problems we test, the most 
difficult is 10H, on which RSD and RDLB generate a 56% 
initial gap. 

Test problem specifications and direct solution times by 
the X-system are presented in Table 4.1. Initial gaps are 
from the RSD algorithm. 
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TABLE 4 . 1 



OPTIMAL SOLUTIONS FOR BASIC TEST PROBLEMS 



PROD. 

GRADE 


INIT . 
GAP 

m 


OPTIMAL SOL'N 


INIT. LARGEST 
VIOL. VIOL./ 

ARC CAP. 


X-SYSTEM 
TIME (SEC*) 
COLD HOT 


4E 


1 


340,916,981 


3 


844/1458 


1021 




10E 


37 


367,135,103 


11 


4 , 998/10,500 


2586 




4H 


14 


130,739,585 


9 


7,185/10,500 


461.7 


496.5 


6H 


35 


138,525,451 


13 


10,048/10,500 


3015.7 


581.0 


8H 


45 


143,526,951 


15 


13,861/10,500 


2431.3 


1067.6 


10H 


54 


169,532 ,339 


20 


13 ,861/10,500 


5352 


1393.5 


(*IBM 308 IK) 










The 


4-commodity problems 


have approximately 


13 , 200 



constraints and 41,600 variables: the 10-commodity problems 
have about 33,000 constraints and 104,000 variables. Hot 
start solutions use a pure network basis crash, and all 
these solutions factor the joint capacitation constraints as 
generalized upper bounds. However, pure network 
factorization is not employed. 

The principal motive for these runs is to establish 
completely optimal yardsticks for the competing indirect 
methods . 
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B. INTRA-ALGORITHM STUDIES 



We first discuss the selection of parameter settings for 
the RSD algorithms, and then' we discuss issues involving 
DDC, including obtaining bounded master problem solutions 
and cut dropping. For RSD(P) , the parameters of interest 
are the initial value of h, the h multiplier a, and the 
penalty exponent, t. For RSD (A) , we examine h and a, fixing 
the exponent at 2.0. In our tests, initial values for h are 
taken as a fraction of the largest cost used in the network, 
in this case a bypass penalty cost of 62,700. Values tested 
in RSD ( P) were 627.0, 62.7, and 6.27. Other researchers 
(e.g., Bertsekas, 1982) suggest penalty multipliers in the 
range of 4 to 10 when solving the problem optimally for each 
value of h. However, since we increase h frequently based 
on periodic reallocations and lower bound adjustments, 
penalty multiplier values between 1.5 and 4 are 
investigated. Finally, to represent the range of possible 
exponents we test the algorithm for t = 1.1, 1.5 and 2. The 
exponent t = 1.0 is not useful in this algorithm since it 
results in u-^j = h for all j in J y at every iteration. 

Table 4.2 summarizes response to changes in the size of 
the penalty multiplier in RSD(P) . The results indicate the 
upper and lower bounds and final gap based on 21 iterations 
of the algorithm for problems 4H and 10H. A multiplier of 
1.5 seems to work best in the smaller problem, while 4.0 
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TABLE 4.2 



RESPONSE TO CHANGING PENALTY MULTIPLIER 



PROD. 


MULT. 

a 


INIT 

h 


EXP 

t 


LB 

( 10 ) 


UB 

( 10 ) 


GAP 

m 


4H 


2 . 0 


6.27 


2 . 0 


130.4 


130.74 


.273 




1.5 


6.27 


2 . 0 


130.67 


130.75 


. 061 




2 . 0 


6.27 


1.5 


129.7 


130.74 


.78 




1.5 


6.27 


1.5 


130.2 


130.76 


. 44 


10H 


4 . 0 


6.27 


2 . 0 


162 


170.1 


4 . 79 




2 . 0 


6.27 


2 . 0 


163.4 


170.6 


4 .22 




4 . 0 


6.27 


1.5 


150.2 


170.1 


11.68 




2 . 0 


6.27 


1.5 


160.6 


170.8 


5.99 




4 . 0 


62.7 


2 . 0 


156.2 


172 


9.17 




2.0 


62.7 


2 . 0 


153.9 


171.2 


10 . 09 




4 . 0 


62.7 


1.5 


163.9 


170.6 


3.93 




2 . 0 


62.7 


1.5 


162.1 


170.5 


4 .91 



generally works best in the larger problem. However, in 
both cases, the response is not substantially worse for a 
multiplier of 2.0. Therefore, we fix the multiplier at 2.0 
and concentrate further studies on the other two parameters. 

Recalling that in the subproblems the gradient of the 
penalty term provides an estimate of the optimal Lagrange 
multipliers , 



u^ = h [ (Ax^-b^ + ] t 1 , 

it is evident that h and t work in concert to affect the 
magnitude of the multiplier estimates at each iteration. We 
want a combination which produces multiplier estimates large 
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enough to generate new extreme points, yet small enough to 

^ , 

allow good lower bound estimates. That is, as u^b! grows 
large, V(u 1 K ) degrades as a lower bound estimate. Table 4.3 
and Figure 4.2 show the response of RSD(P) to changes in h 
and t. Results are for 21 iterations of the algorithm, with 
the penalty multiplier fixed at 2.0 for all cases. In each 
case we retain a maximum of seven extreme points plus the 
current master problem solution, and perform a primal 
resource allocation every seventh iteration. 

We summarize the computational results as follows. 
First, there is a distinct advantage to starting with a 
small penalty parameter. Initial values of h = (max. arc 
cost) x 10“ 3 or 10“ 4 provide superior overall performance in 
the test problems. Second, the penalty exponent t = 1.1 
shows a definite overall disadvantage to values of 1.5 and 
2.0. Both t = 1.5 and t = 2 perform well for an appropriate 
matching value of h, but we favor t = 1.5 because it 

provides the best overall final results, apparently reducing 
the dominance of the most grossly violated arcs when 
= h ( (AjX-b^) * ° is computed. 

Figure 4.2 graphically displays the interactions between 
h and t in RSD(P) by depicting upper and lower bounds 
relative to the optimal solution of problem 10H. The heavy 
black lines suggest that the best response is with h = 62.7, 
t = 1.5, but it appears that any choice of 0 < h < 70 and 
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Figure 4.2 Response to Changing Parameters, 

RSD(P) on Problem 10H 

1.5 < t < 2 will perform well. Note that the upper bound is 
nearly optimal throughout the suggested range. 

The parametric responses reported here suggest a simple 
method for selecting appropriate starting parameters for the 
penalty algorithm. First, we choose the h multiplier a = 

A 

2.0 and set 1.5 < t < 2.0. Then we make an estimate, u^ j , 
of the optimal dual variable associated with the most 
violated constraint found when LR(0) is solved, and select h 
so that 
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RESPONSE TO CHANGES IN PENALTY PARAMETER AND EXPONENT 

FOR RSD (P) 
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h * max[ (Ajx-b^^j ) + ] t-1 ~ u^ . 

Better guesses of evidently evoke better performance of 
the algorithm. In this case, we simply use a fraction of 
the largest cost as our estimate; more sophisticated 
estimates may be made by examining differences in costs 
between least and most expensive paths through the network, 
or by solving a single problem with aggregated 
multicommodity supplies and demands. One point is clear 
from the data: it is better to underestimate the best 
initial value for h than to overestimate, because the 
algorithm is quickly self-correcting for low estimates. 
That is, if the initial penalty is so small that it does not 
produce a new extreme point, the algorithm immediately (and 
repeatedly) increases the penalty parameter until it does. 
Furthermore, when the initial multiplier estimates are good, 
the subproblems generate extreme points more closely related 
to an optimal solution. Consequently, both lower bounds and 
primal solutions quickly benefit. On the other hand, 
excessively large penalties generate extreme points with 
little relation to the optimal solution, producing poor 
lower bounds and degrading the primal solutions obtained 
through resource allocations. 

Figure 4 . 3 shows the interactions of parameters for 
RSD(A). In this case, we use a constant penalty exponent t 
= 2.0, and vary h and a. As indicated by the response 
surfaces in the figure, the best bounds are obtained for a 
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combination of small initial penalty parameters and smaller 
multipliers. We fix the multiplier at 1.2 in subsequent 
tests. Again, the initial value of h may be calculated from 
a u^ estimate using 

h x [max (AjX°-b 1 j ) + ] ~ u^ , 

and the algorithm quickly corrects for underestimates but 
recovers slowly from overestimates. 

The computational experience presented by Hearn et al. 
(1984) indicates that performance improves as the number of 
retained extreme points is increased. In fact, if enough 
points are retained, theoretically the heuristic becomes a 
finite algorithm. Performance in our tests improves for up 
to about eight points, and then levels off. As the number 
of retained points increases, the time spent in the master 
problem increases. In the following section, we fix the 
number of retained points at eight and obtain quite 
satisfactory performance. 

Four performance issues concern us with algorithm DDC. 

The first is reaching a bounded condition in the dual 
master problem (or, equivalently, reaching a feasible 
solution to the Dantzig-Wolfe master problem) . Since the 
constraint set of the dual master problem may be viewed as 
forming a piecewise linear tangential approximation of the 
Lagrangean dual function, we obtain no feasible primal 
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Figure 4.3 Response to Changing Parameters, 

RSD(A) on Problem 4H 

solution until the set is bounded. In problem 10H, for 

instance, this requires 6 iterations. 

One method to obtain early boundedness is to perform a 
resource allocation from the first relaxation, as in the RSD 
algorithms, and append this information to the DDC master 
problem as a "pseudocut." This method does provide an early 
feasible solution and the first pseudocut seems to remain 
binding for many iterations. However, our results show that 
it does not improve the point at which boundedness is 
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naturally achieved in the master problem using cuts 
generated by the subproblems, nor does it affect the 
subsequent convergence trajectory. 

Second, we must make an intelligent choice of the 
decomposition tolerance e in the cut thresholds ub+e. e 
acts as a relaxation parameter. If e is too small, we 
restrict the size of our improvements in the dual 
multipliers and inch uphill; if e is too large, we overshoot 
good multiplier choices and may oscillate through a sequence 
of extreme points which generate poor cuts and consequently 
poor dual solutions. Moderation is a virtue in the choice 
of e. For this set of test problems, we initially set e = 
100,000 since no dual variable is expected to exceed 62,700 
and obtain reasonable performance by reducing e by half 
whenever e-optimality is achieved (i.e., whenever the cuts 
cannot be satisfied) . A final value ef terminates this 
sequence of master problem relaxations. 

Third, the decomposition goals are a daunting 
complication at first glance. However, a goal constraint 
for each master problem variable can be incorporated as a 
generalized upper bound, and the resulting master problem 
solved with very little additional effort. 

Finally, we must consider the effects of accumulating 
too many cuts in the master problem. As the number of 
retained cuts grows, more time and space are required to 
solve the master, and more time is required to regenerate 
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solutions from peripheral storage. However as the 
approximation to the dual response surface improves, many of 
the earlier cuts become non-binding, either temporarily or 
permanently. Figures 4.4 and 4.5 show cut histories for 
problems 4H and 10H, respectively. In these figures, the 
height of the "skyscraper" indicates the weight of the cuts 
at each iteration. Both results show that only a few cuts 
reenter a subsequent solution after once becoming slack, and 
then only at a small dual weight. 




Figure 4.4 Active Cuts at Each Iteration 
DDC on Problem 4H 
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We take advantage of this property by restricting the 
number of cuts retained to n^old = 20. We also use an 
exponential moving average of the previous cut weightings 
(master problem dual variables) to determine which cuts to 
drop. In the (unlikely) event that a taut cut must be 
replaced, the (upper bound) value of a recoverable primal 
solution must be relaxed accordingly. 




Figure 4.5 Active Cuts at Each Iteration 
DDC on Problem 10H 
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The master problems are best solved by using a basis 
crash from the preceding solution in the sequence (except in 
the rare case that a taut cut has been replaced) . 

For our problems, the master problems have as many as 
n hold = 20 (dense) cut constraints, and 1,071 decomposition 
goal (GUB) constraints, one for each of the 1,071 variables. 
Master problem solution times average 0.2 seconds (IBM 3033 
AP) . 

The pure network subproblems have 3,300 nodes and 10,400 
arcs, and solve in about 1 second per commodity. To reduce 
these dominating computation times, a hot start mechanism is 
used which initially restricts the network to those 
variables known to have had positive flow in any prior 
solution for that commodity. This restriction tends to 
reduce solution time as more experience is gained over 
iterations. After about 10 cuts, network subproblems rarely 
use any new arc not used before. 

The networks subproblems are much more expensive to 
solve in DDC than the LP master problems. This is reversed 
from experience with primal decomposition, for which the 
master problems are commonly mixed integer LPs (e.g., see 
Geoffrion and Graves, 1974, or Brown, Graves and 
Honczarenko, 1983) . Unfortunately, further mechanisms to 
reduce network solution times require additional storage, 
and we have limited our implementation to operating with a 
default 2 M-byte VM/CMS region. Overlays are used for each 
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commodity subproblem and the master problem. We do not want 
to limit the number of commodities and have therefore used 
no data structure spanning commodities. 
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C. INTERMETHOD COMPARISONS 

We now turn our attention to comparisons among the 
various algorithms. We establish acceptable solution gaps 
of 1% and 4% for the four and ten product problems, 
respectively, for two reasons. First, for such large-scale 
planning models, these levels of accuracy are adequate. 
Second, in order to maintain integer arithmetic in the 
network problems, we round multiplier estimates and capacity 
allocations, inducing the possibility that exact optima to 
the original problem have been excluded. 

Figures 4.6 and 4.7 show the significant computational 
advantage of RSD(A) over RDLB. For both test problems 4H 




Figure 4.6 Solution Trajectory Comparison (Problem 4H) 
RSD(A) vs. RDLB 



102 



► 



o 




Figure 4.7 Solution Trajectory Comparison (Problem 10H) 
RSD(A) vs. RDLB 

and 10H, the performance of the RDLB is much worse than 
RSD(A) because the subgradient reallocations are generally 
ineffective. In fact, although the subgradient realloca- 
tions lead to an optimal solution for problem 4E (not shown) 
no improvements are found for 10H, and improvements totaling 
.007% out of an 11.8% gap are made in 4H, only by using an 
exceedingly small step size. 

The poor primal performance of RDLB is compounded, by 
slow convergence of the Lagrangean problem. As indicated by 
the figures, it usually requires accumulation of three to 
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five new subgradients before there is enough information 
available to find a good conjugate direction and 
substantially improve the lower bound. Since only a few 
costs are typically changed in solving the second set of 
network subproblems for the quadratic approximation to the 
line search, we use the previous bases as a "hot restart," 
but the time savings is not adequate to make the algorithm 
competitive in performance with the other algorithms. Even 
though other lower bounding strategies are available for 
testing, no further tests are performed on this algorithm 
because of its combined poor performance on both lower and 
upper bounds . 

For the two test problems shown, the RDLB method 
requires significantly longer to do less. The algorithm 
terminates with final gaps of 5.83 and 10.7 percent, both 
unacceptable by our stated standard. The primal bound is 
relatively poor in both problems, corroborating the results 
of other researchers (e.g. , Allen (1985)) which show 
subgradient-based resource direction unable to achieve an 
optimal solution. 

We now compare RSD(A) to RSD(P) (t = 1.5). In Figure 
4.8, RSD(A) takes about 60 seconds on problem 4H to reach a 
solution within .15% of optimal, while RSD(P) reaches a .69% 
solution in under 120 seconds. In Figure 4.9, the augmented 
form reaches a 4% solution to problem 10H in about 270 
seconds and a 3.56% solution after 21 iterations in about 
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Figure 4.8 Solution Trajectory Comparison 
RSD(A) vs. RSD ( P) , Problem 4H 

440 seconds. The penalty version takes nearly 470 seconds 
to reach the 4% level, closing at 3.93%. The X-system on an 
IBM 3 08 IK solves this problem optimally in 1393.5 seconds 
using the network hot-start procedure (5,352 seconds 
without!). Considering the difference in computer speeds, 
RSD (A) reaches an acceptable solution in 1/10 the time of 
the X-system. 

In comparing RSD (A) and RSD(P) , we note not so much the 
difference in performance times as the movement of the lower 
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Figure 4.9 Solution Trajectory Comparisons 
(Problem 10H) RSD(A) vs. RSD(P) 



bounds. The bound for RSD(A) climbs more quickly than 
RSD(P) apparently due to better instantaneous multiplier 
estimates and the fact that h is increased in smaller steps 
in RSD(A), resulting in a better-conditioned master problem 
at each iteration. Complete convergence of bounds is not 
anticipated for either algorithm, because we are rounding 
the multiplier estimates to perform integer arithmetic in 
the subproblems, we are not scaling the subproblems, and we 
are retaining only eight points. However, the solutions 
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obtained by both algorithms are adequate, especially since 
the primal incumbents invariably seem to be very near 
optimal . 

Finally, Figures 4.10 and 4.11 compare the performance 
of RSD(A) to the performance of DDC. For both problems 4H 
and 10H, the RSD(A) algorithm significantly outperforms the 
dual algorithm. While RSD takes about 60 seconds to reach a 
solution within .15% of optimal in 4H, DDC required 160 
seconds to reach a 1% solution and about 180 seconds to 
reach a gap comparable to RSD (A) . 



o 




Figure 4.10 Solution Trajectory Comparison 

RSD (A) vs. DDC, Problem 4H 
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Figure 4.11 Solution Trajectory Comparison 

RSD(A) vs. DDC , Problem 10H 



In 10H , the comparison is similar. RSD(A) reaches a 4% 
solution in about 180 seconds and 3.56% in 440 seconds. DDC 
terminates at 650 seconds after 50 iterations with an 8.5% 
gap. These results are especially significant since the 
dual decomposition algorithm has a reputation for rapid 
initial progress. It is evident from both test problems 
that the RSD(A) algorithm performed better both on the upper 
and lower bounds. 
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Table 4.4 summarizes the final results for all 
algorithms and basic test problems run on an IBM 3 033AP. 
Notice that RDLB runs reasonably well on 4E, but its times 
and solution quality are unacceptable for the hard problems. 
On the other hand, RSD(A) produces near-optimal primal 
solutions, acceptable bounds, and fast times for each test 
problem. The dual decomposition produces good final 
results, but through a poorer trajectory, and shows signs of 
laboring on problem 10H. 

Finally, we construct a problem of 100 products to test 
DDC and RSD(A) . It has 31 initial capacity violations with 
a maximum violation of 238% of arc capacity. Due to the 
increase in problem size, the initial value of h is reduced 
by a factor of 10 in RSD(A) : no other changes have been 
made. Figure 4.12 shows that RSD(A) reaches an acceptable 
4% solution in about 1000 seconds and concludes 21 
iterations in 3000 seconds with a 1.5% gap. DDC terminates 
at 4090 seconds and 50 iterations with a 12.15% gap 
remaining. We note that a previous effort on this same 
problem using a resource-directive algorithm achieved an 
11.8% solution in 1000 seconds (Staniec, 1984), but made no 
further improvements in an hour of computation. The RSD(A) 
algorithm achieved an acceptable solution in less than 17 
minutes for a problem of some 330,000 constraints and 
1,040,000 variables. 
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SUMMARY DATA: COMPARATIVE PERFORMANCES 
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V. CONCLUSION 



We have presented three algorithms for solving large- 
scale linear programs that fall within the general framework 
of decomposition, with specific application to the MCTP. 

The first algorithm, a resource-directive decomposition, 
has contributed a new, simplified method of projecting 
subgradient reallocations in the primal problems, and an 
improved method for terminating the algorithm via conjugate 
subgradient directions and approximate line searches in the 
Lagrangean lower bounding routine. However, the method is 
not computationally effective due to inability to find 
improving subgradient reallocations and due to computational 
burden in the line searches. 

The second algorithm is a dual decomposition using a 
master problem which is the relaxed dual of the standard 
Dantzig-Wolfe master problem. Tests show the method to be 
computationally effective, but with slow convergence for 
very large problems. 

The last algorithm (with two variants) is a new, non- 
linear price-directive decomposition using a penalty 
transformation of the original problem. Computational tests 
show the method to be approximately ten times faster than a 
direct simplex-based algorithm, and 2-3 times faster than 
the dual decomposition tested. 
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The initial success of the RSD algorithm prompts several 
areas for further investigation. First, the test suite is 
limited. We intend to test the algorithm on other large- 
scale problems 

Second, Hearn et al. (1984) suggest that quadratic 
approximations to the master problem may speed its solution 
significantly without degrading convergence rate. Bertsekas 
(1982a) has described a projected Newton method for solving 
non-linear objectives in the presence of simple constraints. 
We propose development of a form of the Bertsekas algorithm 
which is capable of handling the penalty transformation of 
MCTP . 

Third, it is reasonable to assume other non-linear 
objectives may have attractive features. For instance, the 
logarithmic barrier function has recently been the subject 
of extensive research for solving linear programs (see, for 
instance, Gill et al. (1986) or Karmarkar (1984)). We 
propose investigating interior point forms of the algorithm, 
with the possibility of developing a hybrid 
interior/exterior point algorithm, taking advantage of both 
penalty and barrier theory to solve linear programs. 

Fourth, we propose a hybrid algorithm using the 
augmented Lagrangean form of RSD combined with dual 
decomposition to take advantage of the best properties of 
both algorithms. Via RSD, we quickly generate good 
estimates of the optimal dual multipliers, and therefore 
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favorable extreme points. However, convergence slows due to 
the restricted number of extreme points retained. 
Increasing the extreme point count equates to increasing the 
time spent solving the master problem. However, these 
favorable extreme points may all be used to produce cuts for 
the dual decomposition master problem, which can be solved 
more quickly and precisely via linear programming. We 
anticipate the result to be a hybrid algorithm which has 
both favorable initial and terminal convergence properties. 

Finally, we consider an extension of the algorithm from 
the shipment planning to a shipment scheduling framework, in 
which, for instance, we must make binary decisions on sea 
shipment arcs to account for shiploads of material. This 
will entail surrounding the RSD algorithm with a shell 
capable of interpreting information to make binary 
selections. One possible way to accomplish this is to 
develop a specialized version of the cross-decomposition 
algorithm of Van Roy (1986) which incorporates RSD. 

In conclusion, the result of this research has been a 
new, computationally attractive algorithm for large scale 
linear programs using non-linear methods. Also, through the 
computational results produced, it has documented some of 
the shortcomings of the subgradient approach in resource 
direction. It is evident that more information (e.g., a 
formal Benders master problem) is required to solve 
difficult problems by resource direction. 
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APPENDIX 



CONVEXITY OF THE QUADRATIC PENALTY FUNCTION 

The following lemma demonstrates that the quadratic 
penalty form of the MCTP is everywhere convex, which makes 
it appropriate for the application of RSD . Convexity of 

other penalty forms may be proved in a similar manner. 

Lemma: The function, q(h,x) is convex for all 

x F = (x|Nx = b 2 ,0 < x < b]^}. 

Proof: q(h,x) = cx + P(h,x) , where 

P(h,x) = jh[ (Ax-b]^) + ] ' (Ax-b^ + . Since cx is 

trivially convex and the sum of convex functions is 
convex, we need only show P(h,x) to be convex; that 
is, that 

P(h,x) > P(h,x) + V x P(h,x) (x-x) , for all x £ F 
where VP x (h,x) = h[ (Ax-b^) + ] 1 A 

Then we define J v = {j|AjX-b 1 j > 0} and revert to 
summation form. P(h,x) is convex if 

I h[ (AjX-b X j ) + ] 2 > l Ih(AjX-b-Lj) 2 + £ h(AjX-b X j ) Aj (x-x). 

j ^ eJ V 
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Rewriting the left side as 



h[ (AjX-bij)- 1 -] 2 - l h[(AjX-b 1;j ) + ] 2 + l h [AjX-b X j ) + ] 2 

j eJ v j/ J v 



and noting that T h [ (AjX-b^i ) + ] 2 > 0 for all x 

j/Jv 



we may prove convexity if 



! ^h[ (AjX-b X j ) + ] 2 >1 (IhfAjX-bij) 

j €J v j eJ V 



+ h(AjX-b X j ) Aj (x-x) } 



or, considering additivity of convex functions, if 

Jh[ (AjX-bijJ + J 2 > |h[ (AjX-bijJ+J 2 + h (AjX-bij ) + Aj (x-x) 

for all x e F, and j e J v . 

If (AjX-b 1 j) + = 0, the right side of the 

inequality is 0, and h [ (Aj x-b^^j ) + ] 2 > 0 trivially. 
Now, for (AjX-b 1 j) + > 0, letting Uj = AjX-b^, 

dropping h we write 

^(AjX-b-Lj) 2 + (AjX-b-Lj) Aj (x-x) = ^Uj (Aj X-b x j ) + Uj(AjX-AjX) 

= Uj [AjX- (b x j J+^AjX-bQj -AjX+(b X j) ] 

1 

= Uj[AjX-b X j - 2 ( A j x_b l j ) ] 

1 - o 

= Uj(AjX-b X j) - 2 u j 

. 1 +9 

Convexity is proved if j [(AjX-b^j) > Uj (AjX- 

1 9 

b;Lj) - 2 u j ’ Two cases occur: 
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For all x | (AjX-b 1 j) + = 0, then AjX-b^j < 0 and we 
have j ( (AjX-bxj ) + ) 2 = 0 > UjfAjX-bxj) - \ Uj 2 = 

{ (AjX-b 1 j) + ] 2 + (AjX-b-Lj ) + Aj (x-x) . 

For all x | (AjX-b 1 j) + > 0, we let Uj = (AjX-bxj) 
and note (Uj-Uj) z > 0. 

n A O A O 9 

But (Uj-Uj)^ = Uj z - 2ujUj + Uj z > 0, and u j z >- 
Uj 2 +2uj-uj, so multiplying by h and restoring 
values, we have 

I h[ (AjX-b 1 j) + ] 2 >| h[ (AjX-b-LjJ+l 2 + h (AjX-bxj ) + Aj (x-x) 

Since Pj(h,x) is convex for all j and the summation 
of convex functions is convex, q(h,x) is convex. 
QED 
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